Farcaster Shares Tracking & Analytics System
Date: 2025-11-13 Status: 📋 Planning Report Type: Technical Design & Implementation Plan Stakeholders: EmProps Studio Team, Analytics Team Related Systems: EmProps Studio, Neynar API, Analytics Infrastructure
Executive Summary
This report outlines the design and implementation plan for a Farcaster shares tracking and analytics system that will measure and analyze the virality of AI workflow generations shared on Farcaster. The system will enable data-driven understanding of:
- Share velocity - How quickly content spreads
- Viral coefficient - Ratio of shares to downstream generations
- User engagement patterns - Early vs. late sharer behavior
- Content virality - Which workflows generate the most organic sharing
Business Impact:
- 📊 Quantify virality - Measure ROI of social features
- 🎯 Optimize for sharing - Identify high-performing workflow patterns
- 👥 Understand user cohorts - Early adopters vs. mainstream behavior
- 📈 Track growth metrics - Share-driven user acquisition
Technical Approach:
- Two-layer architecture: Operational data layer + Analytics layer
- Star schema design for efficient analytical queries
- Incremental ETL with historical backfill
- Scalable foundation for future warehouse integration
Table of Contents
- Context
- Problem Statement
- Data Source
- Architecture Design
- Implementation Plan
- Analytics Capabilities
- Technical Specifications
- ETL Strategy
- Success Metrics
- Future Considerations
Context
Current State
EmProps Studio Sharing:
- Users create AI workflows and generate outputs (images, videos, etc.)
- Generated content can be shared to Farcaster with embedded workflow URLs
- URL pattern:
https://emerge-mini-app.vercel.app/workflow/{collection_id} - Each workflow is uniquely identified by
collection_id(UUID)
Current Gaps:
- ❌ No visibility into how often workflows are shared
- ❌ No virality metrics - Can't measure share → generation conversion
- ❌ No user cohort analysis - Can't identify power sharers vs. passive users
- ❌ No historical data - Missing ~10k historical shares on Farcaster
Opportunity:
- ✅ Neynar API provides complete feed of casts containing our URLs
- ✅ Historical data available via pagination
- ✅ Real-time feed updates for ongoing tracking
- ✅ Rich metadata: author FID, timestamps, cast text, engagement metrics
Business Goals
Measure Virality
- What percentage of generations get shared?
- What is the viral coefficient (shares → new generations)?
- How does share velocity correlate with quality/type of workflow?
Optimize Product
- Which workflows naturally drive sharing?
- What features correlate with higher share rates?
- How can we encourage more organic sharing?
Understand Users
- Who are the power sharers driving growth?
- Early adopter vs. mainstream sharer behavior
- Cohort analysis: retention and engagement patterns
Track Growth
- Share-driven user acquisition metrics
- Week-over-week share growth trends
- Network effects and compounding growth
Problem Statement
Core Problem
We need to answer strategic questions about content virality, but have no data infrastructure to support these queries.
Questions We Can't Answer Today:
- How many times has each workflow been shared on Farcaster?
- What is our viral coefficient (shares → downstream generations)?
- Which users are the most active sharers?
- How does share velocity change over time for a workflow?
- What is the time lag between generation and first share?
- Which cohorts of users have the highest share rates?
- What content attributes correlate with high share rates?
Technical Challenges
Data Source Integration
- Neynar API requires pagination for historical data
- Rate limiting and API quota management
- Extracting
collection_idfrom diverse URL formats
Data Volume
- ~10,000 historical shares to backfill
- Ongoing incremental updates (100s-1000s per day)
- Deduplication across backfill and ongoing syncs
Analytics Performance
- Complex analytical queries (viral coefficient, cohort analysis)
- Need pre-aggregated metrics for dashboard performance
- Star schema design for efficient time-series queries
Schema Evolution
- Start simple, expand as analytical needs grow
- Future warehouse migration (BigQuery/Snowflake)
- Maintain separation of operational vs. analytical data
Data Source
Neynar API - Feed Endpoint
API Endpoint:
curl 'https://api.neynar.com/v2/farcaster/feed' \
-H 'accept: application/json' \
-H 'x-api-key: YOUR_API_KEY' \
-G \
--data-urlencode 'feed_type=filter' \
--data-urlencode 'filter_type=embed_url' \
--data-urlencode 'embed_url=https://emerge-mini-app.vercel.app/workflow/' \
--data-urlencode 'limit=100' \
--data-urlencode 'cursor=PAGINATION_CURSOR'Response Structure:
{
"casts": [
{
"hash": "0xabc123...",
"timestamp": "2025-11-13T10:30:00Z",
"author": {
"fid": 12345,
"username": "creator",
"display_name": "Creative User"
},
"text": "Check out this amazing AI generation!",
"embeds": [
{
"url": "https://emerge-mini-app.vercel.app/workflow/550e8400-e29b-41d4-a716-446655440000"
}
],
"reactions": {
"likes": 42,
"recasts": 7
}
}
],
"next": {
"cursor": "eyJwYWdlIjo..."
}
}Data Extraction:
- Primary Key:
cast_hash(unique identifier for each cast) - Timestamp:
timestamp(when shared) - Author:
author.fid(Farcaster user ID) - Workflow ID: Extract
collection_idfromembeds[].urlpattern - Context:
text(cast message for future sentiment analysis) - Engagement:
reactions.likes,reactions.recasts(future use)
API Characteristics:
- Pagination: Cursor-based, 100 results per page
- Rate Limiting: 150 requests per minute (per API key)
- Historical Access: Complete access to all historical casts
- Real-time: Feed updates reflect new casts within minutes
- Filtering: Pre-filtered by our embed URL (efficient)
Collection ID Extraction
URL Pattern Variations:
https://emerge-mini-app.vercel.app/workflow/550e8400-e29b-41d4-a716-446655440000
https://emerge-mini-app.vercel.app/workflow/550e8400-e29b-41d4-a716-446655440000?param=value
https://emerge-mini-app.vercel.app/workflow/550e8400-e29b-41d4-a716-446655440000/viewExtraction Logic:
function extractCollectionId(url: string): string | null {
const match = url.match(/\/workflow\/([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})/i);
return match ? match[1] : null;
}Error Handling:
- Invalid URLs → Log and skip (don't fail entire batch)
- Multiple embeds → Process all matching URLs
- Malformed UUIDs → Log for investigation
Architecture Design
Two-Layer Architecture
┌─────────────────────────────────────────────────────┐
│ Neynar API Feed │
│ (External Data Source - Farcaster) │
└──────────────────┬──────────────────────────────────┘
│
│ Backfill (one-time)
│ + Sync (periodic)
↓
┌─────────────────────────────────────────────────────┐
│ OPERATIONAL DATA LAYER │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ miniapp_share │ │
│ │ - cast_hash (PK) │ │
│ │ - shared_at │ │
│ │ - author_fid │ │
│ │ - collection_id │ │
│ │ - cast_text │ │
│ │ - created_at │ │
│ └───────────────────────────────────────────┘ │
│ │
│ Purpose: Raw share events (insert-only) │
│ Updates: Incremental syncs via cron job │
│ Retention: Permanent (source of truth) │
└──────────────────┬──────────────────────────────────┘
│
│ Daily ETL
│ (Transform + Aggregate)
↓
┌─────────────────────────────────────────────────────┐
│ ANALYTICS LAYER (Star Schema) │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ analytics.share_events (Fact Table) │ │
│ │ - event_id (PK) │ │
│ │ - event_date │ │
│ │ - shared_at │ │
│ │ - user_fid │ │
│ │ - workflow_id (FK → dim_workflow) │ │
│ │ - generation_id (FK → farcaster_collection)│ │
│ │ - is_first_share │ │
│ │ - days_since_creation │ │
│ └───────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ analytics.dim_workflow (Dimension) │ │
│ │ - workflow_id (PK) │ │
│ │ - workflow_name │ │
│ │ - total_generations │ │
│ │ - total_shares │ │
│ │ - share_rate (shares / generations) │ │
│ │ - viral_coefficient │ │
│ │ - first_shared_at │ │
│ │ - last_shared_at │ │
│ └───────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ analytics.dim_user (Dimension) │ │
│ │ - user_fid (PK) │ │
│ │ - first_share_date │ │
│ │ - total_shares │ │
│ │ - total_workflows_shared │ │
│ │ - cohort (YYYY-MM) │ │
│ │ - user_type (power/casual/one-time) │ │
│ └───────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ analytics.dim_date (Dimension) │ │
│ │ - date_key (PK) YYYYMMDD │ │
│ │ - full_date │ │
│ │ - year, quarter, month, week │ │
│ │ - day_of_week, is_weekend │ │
│ └───────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ analytics.daily_metrics (Aggregates) │ │
│ │ - date (PK) │ │
│ │ - total_shares │ │
│ │ - unique_sharers │ │
│ │ - unique_workflows │ │
│ │ - avg_shares_per_workflow │ │
│ │ - viral_coefficient_7d │ │
│ │ - share_velocity_wow │ │
│ └───────────────────────────────────────────┘ │
│ │
│ Purpose: Optimized for analytical queries │
│ Updates: Daily ETL from operational layer │
│ Retention: Permanent (with historical snapshots) │
└─────────────────────────────────────────────────────┘Design Rationale
Why Two Layers?
Operational Layer (
miniapp_share)- Simple, append-only table
- Source of truth for raw share events
- Fast writes during sync jobs
- No complex transformations (just extract collection_id)
Analytics Layer (
analytics.*)- Star schema optimized for queries
- Pre-aggregated metrics (fast dashboards)
- Denormalized for query performance
- Clean separation of concerns
Benefits:
- ✅ Reliability: Operational data never modified (audit trail)
- ✅ Performance: Analytics queries don't impact sync jobs
- ✅ Flexibility: Can rebuild analytics layer without re-fetching from API
- ✅ Scalability: Easy to migrate analytics to warehouse (BigQuery/Snowflake)
- ✅ Incremental: Build analytics layer piece-by-piece as needs evolve
Implementation Plan
Phase 1: Operational Data Layer (Week 1)
Objective: Capture all share events in operational table
Tasks:
- ✅ Create
miniapp_sharetable schema - ✅ Build Neynar API client with pagination support
- ✅ Implement collection_id extraction logic
- ✅ Create backfill script (one-time historical import)
- ✅ Test with sample data (100 records)
- ✅ Run full backfill (~10k records)
- ✅ Validate data quality (no duplicates, valid UUIDs)
Deliverables:
- Database migration for
miniapp_sharetable NeynarClientTypeScript classbackfill-farcaster-shares.tsscript- Unit tests for extraction and deduplication logic
Success Criteria:
- All historical shares imported successfully
- No duplicate
cast_hashentries - 100% valid
collection_idUUIDs - Script handles API rate limiting gracefully
Phase 2: Ongoing Sync Job (Week 1-2)
Objective: Keep operational data up-to-date automatically
Tasks:
- ✅ Create incremental sync job (fetch only new shares)
- ✅ Implement cursor-based pagination tracking
- ✅ Setup cron job (every 6-12 hours)
- ✅ Add monitoring/alerting for sync failures
- ✅ Test sync job with mock data
- ✅ Deploy to production
Deliverables:
sync-farcaster-shares.tscron job- Cursor tracking mechanism (Redis or DB)
- Error handling and retry logic
- Monitoring dashboard for sync health
Success Criteria:
- Sync job runs reliably every 6-12 hours
- New shares appear within 12 hours of posting
- Zero data loss during syncs
- Alerts fire on API errors or sync failures
Phase 3: Analytics Schema - Fact Table (Week 2)
Objective: Create core analytics fact table
Tasks:
- ✅ Design
analytics.share_eventsschema - ✅ Create database migration
- ✅ Build ETL script (operational → analytics)
- ✅ Add derived fields (is_first_share, days_since_creation)
- ✅ Create indexes for common query patterns
- ✅ Test with operational data
Deliverables:
analytics.share_eventstable- ETL script:
transform-share-events.ts - Indexes on: event_date, user_fid, workflow_id
- Unit tests for transformations
Success Criteria:
- All operational shares transformed successfully
- is_first_share flag accurate
- days_since_creation calculated correctly
- Queries return results in <100ms
Phase 4: Analytics Schema - Dimensions (Week 2-3)
Objective: Add dimension tables for rich analytics
Tasks:
- ✅ Create
analytics.dim_workflowtable - ✅ Create
analytics.dim_usertable - ✅ Create
analytics.dim_datetable - ✅ Build aggregation logic for dimensions
- ✅ Create materialized views or scheduled updates
- ✅ Test join performance
Deliverables:
- Dimension tables with pre-aggregated metrics
- ETL scripts for each dimension
- Documentation on dimension attributes
Success Criteria:
- Workflow dimension shows accurate share counts
- User dimension correctly identifies cohorts
- Date dimension enables efficient time-series queries
Phase 5: Pre-Aggregated Metrics (Week 3)
Objective: Fast dashboard queries via pre-aggregation
Tasks:
- ✅ Create
analytics.daily_metricstable - ✅ Build daily aggregation job
- ✅ Calculate viral coefficient (7-day rolling)
- ✅ Calculate share velocity (week-over-week)
- ✅ Test aggregation accuracy
- ✅ Schedule daily ETL job
Deliverables:
analytics.daily_metricstable- Daily ETL job:
compute-daily-metrics.ts - Scheduled job (runs daily at 2 AM UTC)
Success Criteria:
- Daily metrics computed within 5 minutes
- Viral coefficient matches manual calculations
- Share velocity trends align with operational data
Phase 6: Analytics Queries (Week 3-4)
Objective: Enable key analytical insights
Tasks:
- ✅ Write SQL queries for viral coefficient
- ✅ Write SQL queries for share velocity
- ✅ Write SQL queries for cohort analysis
- ✅ Create query library/module
- ✅ Build API endpoints for dashboard
- ✅ Test query performance under load
Deliverables:
analytics-queries.tsmodule- REST API endpoints:
/api/analytics/viral-coefficient, etc. - Query performance documentation
Success Criteria:
- All key queries return in <500ms
- API endpoints ready for dashboard integration
- Queries scale to 100k+ share events
Analytics Capabilities
Key Metrics
1. Viral Coefficient
Definition: For each shared workflow, how many new generations are created?
Formula:
Viral Coefficient = (New Generations from Shared Workflows) / (Total Shares)Example Query:
WITH shared_workflows AS (
SELECT DISTINCT collection_id
FROM analytics.share_events
WHERE shared_at >= NOW() - INTERVAL '7 days'
),
subsequent_generations AS (
SELECT COUNT(*) as generation_count
FROM farcaster_collection fc
WHERE fc.created_at >= (
SELECT MIN(shared_at) FROM analytics.share_events se
WHERE se.generation_id = fc.id
)
AND fc.workflow_id IN (SELECT collection_id FROM shared_workflows)
)
SELECT
(SELECT generation_count FROM subsequent_generations)::float /
(SELECT COUNT(*) FROM analytics.share_events WHERE shared_at >= NOW() - INTERVAL '7 days')
AS viral_coefficient_7d;Insight: Viral coefficient > 1.0 indicates compounding growth
2. Share Velocity
Definition: How quickly is sharing activity growing week-over-week?
Formula:
Share Velocity = ((This Week Shares - Last Week Shares) / Last Week Shares) * 100Example Query:
WITH weekly_shares AS (
SELECT
DATE_TRUNC('week', shared_at) as week,
COUNT(*) as share_count
FROM analytics.share_events
GROUP BY DATE_TRUNC('week', shared_at)
ORDER BY week DESC
LIMIT 2
)
SELECT
((current.share_count - previous.share_count)::float / previous.share_count * 100) as velocity_pct
FROM
(SELECT * FROM weekly_shares LIMIT 1 OFFSET 0) current,
(SELECT * FROM weekly_shares LIMIT 1 OFFSET 1) previous;Insight: Positive velocity indicates accelerating growth
3. Cohort Analysis
Definition: Compare sharing behavior across user cohorts (grouped by first share date)
Example Query:
SELECT
du.cohort,
COUNT(DISTINCT du.user_fid) as cohort_size,
AVG(du.total_shares) as avg_shares_per_user,
SUM(CASE WHEN du.total_shares >= 10 THEN 1 ELSE 0 END)::float / COUNT(*) as power_user_pct
FROM analytics.dim_user du
GROUP BY du.cohort
ORDER BY du.cohort DESC;Insight: Identify which cohorts have highest engagement and retention
4. Workflow Share Rate
Definition: What percentage of workflow generations get shared?
Example Query:
SELECT
dw.workflow_id,
dw.workflow_name,
dw.total_generations,
dw.total_shares,
(dw.total_shares::float / NULLIF(dw.total_generations, 0) * 100) as share_rate_pct
FROM analytics.dim_workflow dw
WHERE dw.total_generations >= 10
ORDER BY share_rate_pct DESC
LIMIT 20;Insight: High share rate workflows are naturally viral
5. Time-to-Share Analysis
Definition: How long between generation and first share?
Example Query:
SELECT
AVG(EXTRACT(EPOCH FROM (se.shared_at - fc.created_at)) / 3600) as avg_hours_to_share,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY se.shared_at - fc.created_at) as median_time_to_share
FROM analytics.share_events se
JOIN farcaster_collection fc ON se.generation_id = fc.id
WHERE se.is_first_share = true;Insight: Fast time-to-share indicates high-quality/exciting content
Dashboard Queries
Daily Overview:
SELECT * FROM analytics.daily_metrics
ORDER BY date DESC
LIMIT 30;Top Sharers (Power Users):
SELECT
user_fid,
total_shares,
total_workflows_shared,
first_share_date,
user_type
FROM analytics.dim_user
WHERE user_type = 'power'
ORDER BY total_shares DESC
LIMIT 50;Trending Workflows (Last 7 Days):
SELECT
dw.workflow_id,
dw.workflow_name,
COUNT(se.event_id) as shares_last_7d,
dw.share_rate
FROM analytics.share_events se
JOIN analytics.dim_workflow dw ON se.workflow_id = dw.workflow_id
WHERE se.event_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY dw.workflow_id, dw.workflow_name, dw.share_rate
ORDER BY shares_last_7d DESC
LIMIT 20;Technical Specifications
Database Schema
Operational Layer
-- Raw share events (source of truth)
CREATE TABLE miniapp_share (
cast_hash TEXT PRIMARY KEY,
shared_at TIMESTAMP NOT NULL,
author_fid INTEGER NOT NULL,
collection_id UUID NOT NULL,
cast_text TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_miniapp_share_collection ON miniapp_share(collection_id);
CREATE INDEX idx_miniapp_share_author ON miniapp_share(author_fid);
CREATE INDEX idx_miniapp_share_date ON miniapp_share(shared_at);Analytics Layer - Fact Table
CREATE SCHEMA IF NOT EXISTS analytics;
-- Core fact table for share events
CREATE TABLE analytics.share_events (
event_id BIGSERIAL PRIMARY KEY,
event_date DATE NOT NULL,
shared_at TIMESTAMP NOT NULL,
user_fid INTEGER NOT NULL,
workflow_id UUID NOT NULL,
generation_id UUID, -- References farcaster_collection.id
is_first_share BOOLEAN DEFAULT false,
days_since_creation INTEGER,
cast_hash TEXT UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_share_events_date ON analytics.share_events(event_date);
CREATE INDEX idx_share_events_user ON analytics.share_events(user_fid);
CREATE INDEX idx_share_events_workflow ON analytics.share_events(workflow_id);
CREATE INDEX idx_share_events_generation ON analytics.share_events(generation_id);Analytics Layer - Dimension Tables
-- Workflow dimension (aggregated metrics)
CREATE TABLE analytics.dim_workflow (
workflow_id UUID PRIMARY KEY,
workflow_name TEXT,
total_generations INTEGER DEFAULT 0,
total_shares INTEGER DEFAULT 0,
unique_sharers INTEGER DEFAULT 0,
share_rate NUMERIC(5,2), -- shares / generations * 100
viral_coefficient NUMERIC(5,2),
first_shared_at TIMESTAMP,
last_shared_at TIMESTAMP,
updated_at TIMESTAMP DEFAULT NOW()
);
-- User dimension (aggregated metrics)
CREATE TABLE analytics.dim_user (
user_fid INTEGER PRIMARY KEY,
first_share_date DATE NOT NULL,
total_shares INTEGER DEFAULT 0,
total_workflows_shared INTEGER DEFAULT 0,
cohort VARCHAR(7), -- YYYY-MM format
user_type VARCHAR(20), -- 'power', 'casual', 'one-time'
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_dim_user_cohort ON analytics.dim_user(cohort);
CREATE INDEX idx_dim_user_type ON analytics.dim_user(user_type);
-- Date dimension (for time-series queries)
CREATE TABLE analytics.dim_date (
date_key INTEGER PRIMARY KEY, -- YYYYMMDD format
full_date DATE NOT NULL UNIQUE,
year INTEGER NOT NULL,
quarter INTEGER NOT NULL,
month INTEGER NOT NULL,
week INTEGER NOT NULL,
day_of_month INTEGER NOT NULL,
day_of_week INTEGER NOT NULL,
day_name VARCHAR(10) NOT NULL,
is_weekend BOOLEAN NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_dim_date_full ON analytics.dim_date(full_date);
CREATE INDEX idx_dim_date_month ON analytics.dim_date(year, month);Analytics Layer - Pre-Aggregated Metrics
-- Daily rollup metrics (for fast dashboard queries)
CREATE TABLE analytics.daily_metrics (
date DATE PRIMARY KEY,
total_shares INTEGER DEFAULT 0,
unique_sharers INTEGER DEFAULT 0,
unique_workflows INTEGER DEFAULT 0,
new_sharers INTEGER DEFAULT 0, -- First-time sharers
avg_shares_per_workflow NUMERIC(10,2),
avg_shares_per_user NUMERIC(10,2),
viral_coefficient_7d NUMERIC(5,2),
share_velocity_wow NUMERIC(5,2), -- Week-over-week % change
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_daily_metrics_date ON analytics.daily_metrics(date DESC);API Client Implementation
Neynar Client Interface:
interface NeynarClientConfig {
apiKey: string;
baseUrl?: string;
rateLimit?: number; // requests per minute
}
interface FarcasterCast {
hash: string;
timestamp: string;
author: {
fid: number;
username: string;
display_name: string;
};
text: string;
embeds: Array<{ url: string }>;
reactions: {
likes: number;
recasts: number;
};
}
interface FeedResponse {
casts: FarcasterCast[];
next?: {
cursor: string;
};
}
class NeynarClient {
constructor(config: NeynarClientConfig) {}
async fetchFeed(params: {
embedUrl: string;
limit?: number;
cursor?: string;
}): Promise<FeedResponse> {}
async *paginateFeed(embedUrl: string): AsyncGenerator<FarcasterCast[]> {}
}Extraction Utility:
interface ShareEvent {
castHash: string;
sharedAt: Date;
authorFid: number;
collectionId: string;
castText: string;
}
function extractShareEvents(casts: FarcasterCast[]): ShareEvent[] {
return casts
.flatMap(cast => {
return cast.embeds
.map(embed => extractCollectionId(embed.url))
.filter((id): id is string => id !== null)
.map(collectionId => ({
castHash: cast.hash,
sharedAt: new Date(cast.timestamp),
authorFid: cast.author.fid,
collectionId,
castText: cast.text
}));
});
}
function extractCollectionId(url: string): string | null {
const match = url.match(/\/workflow\/([0-9a-f-]{36})/i);
return match ? match[1] : null;
}ETL Strategy
Data Flow
┌──────────────┐
│ Neynar API │
└──────┬───────┘
│
│ Backfill (one-time): Paginate all historical
│ Sync (periodic): Fetch only new casts
│
↓
┌──────────────────────────────────────────┐
│ OPERATIONAL LAYER: miniapp_share │
│ - Raw, append-only │
│ - cast_hash deduplication │
│ - Minimal transformation │
└──────┬───────────────────────────────────┘
│
│ Daily ETL (scheduled job)
│
↓
┌──────────────────────────────────────────┐
│ ANALYTICS LAYER: share_events │
│ - Denormalized for query performance │
│ - Derived fields (is_first_share, etc.) │
│ - Joins with farcaster_collection │
└──────┬───────────────────────────────────┘
│
│ Daily Aggregation
│
↓
┌──────────────────────────────────────────┐
│ DIMENSIONS & METRICS │
│ - dim_workflow (workflow aggregates) │
│ - dim_user (user aggregates) │
│ - daily_metrics (pre-computed KPIs) │
└──────────────────────────────────────────┘ETL Jobs
1. Backfill Job (One-Time)
Purpose: Import all historical shares from Neynar API
Process:
async function backfillFarcasterShares() {
const client = new NeynarClient({ apiKey: process.env.NEYNAR_API_KEY });
const embedUrl = 'https://emerge-mini-app.vercel.app/workflow/';
let totalProcessed = 0;
let totalInserted = 0;
for await (const castBatch of client.paginateFeed(embedUrl)) {
const shareEvents = extractShareEvents(castBatch);
// Upsert to handle duplicates gracefully
const result = await db.query(`
INSERT INTO miniapp_share (cast_hash, shared_at, author_fid, collection_id, cast_text)
VALUES ${shareEvents.map((_, i) => `($${i*5+1}, $${i*5+2}, $${i*5+3}, $${i*5+4}, $${i*5+5})`).join(', ')}
ON CONFLICT (cast_hash) DO NOTHING
RETURNING cast_hash
`, shareEvents.flatMap(e => [e.castHash, e.sharedAt, e.authorFid, e.collectionId, e.castText]));
totalProcessed += shareEvents.length;
totalInserted += result.rowCount;
console.log(`Processed ${totalProcessed}, inserted ${totalInserted}`);
}
}Execution:
pnpm run backfill:farcaster-sharesExpected Duration: ~15-20 minutes for 10k shares (with rate limiting)
2. Sync Job (Scheduled)
Purpose: Fetch new shares periodically (every 6-12 hours)
Process:
async function syncFarcasterShares() {
// Get last synced cursor from Redis/DB
const lastCursor = await redis.get('neynar:last_cursor');
const client = new NeynarClient({ apiKey: process.env.NEYNAR_API_KEY });
const response = await client.fetchFeed({
embedUrl: 'https://emerge-mini-app.vercel.app/workflow/',
limit: 100,
cursor: lastCursor || undefined
});
const shareEvents = extractShareEvents(response.casts);
// Insert new shares
if (shareEvents.length > 0) {
await insertShareEvents(shareEvents);
}
// Save new cursor for next sync
if (response.next?.cursor) {
await redis.set('neynar:last_cursor', response.next.cursor);
}
return { processed: shareEvents.length };
}Cron Schedule:
# Every 12 hours
schedule: "0 */12 * * *"3. Analytics Transform Job (Daily)
Purpose: Transform operational data into analytics layer
Process:
async function transformShareEvents() {
// 1. Insert new share events
await db.query(`
INSERT INTO analytics.share_events (
event_date, shared_at, user_fid, workflow_id, generation_id,
is_first_share, days_since_creation, cast_hash
)
SELECT
DATE(ms.shared_at) as event_date,
ms.shared_at,
ms.author_fid as user_fid,
ms.collection_id as workflow_id,
fc.id as generation_id,
(NOT EXISTS (
SELECT 1 FROM analytics.share_events se2
WHERE se2.generation_id = fc.id AND se2.shared_at < ms.shared_at
)) as is_first_share,
DATE_PART('day', ms.shared_at - fc.created_at)::INTEGER as days_since_creation,
ms.cast_hash
FROM miniapp_share ms
LEFT JOIN farcaster_collection fc ON fc.id = ms.collection_id
WHERE NOT EXISTS (
SELECT 1 FROM analytics.share_events se
WHERE se.cast_hash = ms.cast_hash
)
`);
// 2. Update workflow dimension
await updateWorkflowDimension();
// 3. Update user dimension
await updateUserDimension();
// 4. Compute daily metrics
await computeDailyMetrics();
}Execution:
# Daily at 2 AM UTC
schedule: "0 2 * * *"4. Aggregation Job (Daily)
Purpose: Update dimension tables and pre-computed metrics
Process:
async function updateWorkflowDimension() {
await db.query(`
INSERT INTO analytics.dim_workflow (
workflow_id, total_shares, unique_sharers, first_shared_at, last_shared_at
)
SELECT
workflow_id,
COUNT(*) as total_shares,
COUNT(DISTINCT user_fid) as unique_sharers,
MIN(shared_at) as first_shared_at,
MAX(shared_at) as last_shared_at
FROM analytics.share_events
GROUP BY workflow_id
ON CONFLICT (workflow_id) DO UPDATE SET
total_shares = EXCLUDED.total_shares,
unique_sharers = EXCLUDED.unique_sharers,
last_shared_at = EXCLUDED.last_shared_at,
updated_at = NOW()
`);
// Update share_rate (requires join with generation counts)
await db.query(`
UPDATE analytics.dim_workflow dw
SET
total_generations = gc.generation_count,
share_rate = (dw.total_shares::float / NULLIF(gc.generation_count, 0) * 100)
FROM (
SELECT workflow_id, COUNT(*) as generation_count
FROM farcaster_collection
GROUP BY workflow_id
) gc
WHERE dw.workflow_id = gc.workflow_id
`);
}Error Handling
Principles:
- Fail Fast, Log Descriptive Errors (per CLAUDE.md guidelines)
- Retry with Exponential Backoff for transient API errors
- Dead Letter Queue for unrecoverable errors
- Alerting on sync failures > 24 hours
Error Scenarios:
| Error Type | Handling Strategy |
|---|---|
| API Rate Limit | Wait + Retry with backoff |
| Invalid Collection ID | Log + Skip individual record |
| Network Timeout | Retry up to 3 times |
| Database Constraint Violation | Log + Skip (likely duplicate) |
| Empty API Response | Log + Continue (no new data) |
| Sync Job Failure | Alert after 2 consecutive failures |
Example Error Handling:
async function fetchWithRetry<T>(
fn: () => Promise<T>,
maxRetries = 3
): Promise<T> {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
if (attempt === maxRetries) {
throw new Error(
`Failed after ${maxRetries} attempts: ${error.message}`
);
}
const delay = Math.pow(2, attempt) * 1000; // Exponential backoff
console.warn(`Attempt ${attempt} failed, retrying in ${delay}ms...`);
await sleep(delay);
}
}
throw new Error('Unreachable');
}Success Metrics
System Health Metrics
Data Quality:
- ✅ Zero duplicate cast_hash in operational table
- ✅ 100% valid UUIDs for collection_id
- ✅ <1% skipped records due to extraction errors
- ✅ Daily sync latency <12 hours (new shares appear same day)
Performance:
- ✅ Backfill completes in <30 minutes for 10k records
- ✅ Sync job completes in <5 minutes for incremental updates
- ✅ Analytics queries return in <500ms (95th percentile)
- ✅ Daily ETL completes in <10 minutes
Reliability:
- ✅ Sync job success rate >99.5% (allow occasional API errors)
- ✅ Zero data loss (all shares eventually captured)
- ✅ ETL failures alert within 1 hour
Business Metrics
Virality KPIs:
- 📊 Viral Coefficient - Target: >0.5 (50% of shares lead to new generations)
- 📊 Share Rate - Target: >10% (10%+ of generations get shared)
- 📊 Share Velocity - Track week-over-week growth trends
- 📊 Power User Ratio - Identify top 10% of sharers
User Engagement:
- 📊 Time-to-Share - Median time between generation and share
- 📊 Multi-Share Rate - % of workflows shared by 2+ users
- 📊 Cohort Retention - Do early sharers continue sharing?
Product Insights:
- 📊 High-Share Workflows - Which templates naturally go viral?
- 📊 Share Text Analysis - What messaging drives engagement?
- 📊 Cross-Reference Analysis - Generation volume vs. share volume correlation
Future Considerations
Phase 2 Enhancements (Months 2-3)
1. Sentiment Analysis
- Use AI to analyze
cast_textfor sentiment (positive/negative/neutral) - Correlate sentiment with share velocity and viral coefficient
- Identify what messaging patterns drive virality
2. User Profile Enrichment
- Fetch additional Neynar user metadata (follower count, bio, etc.)
- Segment users by influence (micro vs. macro influencers)
- Track share reach (author's follower count × engagement)
3. Cross-Platform Tracking
- Track shares on Twitter/X (if embeds contain workflow URLs)
- Track shares in Discord/Telegram (via webhook integrations)
- Unified cross-platform viral coefficient
4. Real-Time Analytics
- Stream processing for live viral coefficient updates
- Real-time alerts for breakout viral workflows
- Live dashboard with WebSocket updates
Phase 3 Warehouse Migration (Months 4-6)
Goal: Scale analytics to handle millions of shares
Architecture:
PostgreSQL (operational) → ETL → BigQuery/Snowflake (analytics)
↓
Looker/Metabase DashboardBenefits:
- Separate operational writes from analytical queries
- Leverage warehouse-specific optimizations (columnar storage, partitioning)
- Enable data science workflows (ML models for predicting virality)
- Support 10x+ data volume growth
Migration Strategy:
- Incremental: Start with
daily_metricstable only - Dual-write: Write to both Postgres and warehouse during transition
- Cutover: Switch dashboards to warehouse once validated
- Deprecate: Eventually remove analytics schema from Postgres
Advanced Analytics
1. Predictive Modeling
- ML model to predict viral coefficient from workflow features
- Recommendations: "This workflow type historically shares well"
- A/B testing framework for share feature experiments
2. Network Analysis
- Graph analysis of share chains (user A → user B → user C)
- Identify super-spreaders and viral pathways
- Community detection among sharers
3. Attribution Tracking
- Track conversions from share → view → generation
- Measure ROI of social features
- Attribution windows (1-day, 7-day, 30-day)
Appendices
A. Example Queries Library
Find Top 10 Most Shared Workflows (All Time):
SELECT
dw.workflow_id,
dw.workflow_name,
dw.total_shares,
dw.unique_sharers,
dw.share_rate
FROM analytics.dim_workflow dw
ORDER BY dw.total_shares DESC
LIMIT 10;Cohort Analysis - Share Retention:
SELECT
du.cohort,
COUNT(DISTINCT du.user_fid) as cohort_size,
COUNT(DISTINCT CASE WHEN se.event_date >= CURRENT_DATE - 30 THEN du.user_fid END) as active_last_30d,
(COUNT(DISTINCT CASE WHEN se.event_date >= CURRENT_DATE - 30 THEN du.user_fid END)::float /
COUNT(DISTINCT du.user_fid) * 100) as retention_pct
FROM analytics.dim_user du
LEFT JOIN analytics.share_events se ON du.user_fid = se.user_fid
GROUP BY du.cohort
ORDER BY du.cohort DESC;Weekly Share Velocity Trend:
WITH weekly_shares AS (
SELECT
DATE_TRUNC('week', event_date) as week,
COUNT(*) as share_count
FROM analytics.share_events
GROUP BY DATE_TRUNC('week', event_date)
)
SELECT
week,
share_count,
LAG(share_count) OVER (ORDER BY week) as prev_week,
((share_count - LAG(share_count) OVER (ORDER BY week))::float /
NULLIF(LAG(share_count) OVER (ORDER BY week), 0) * 100) as velocity_wow_pct
FROM weekly_shares
ORDER BY week DESC;B. Monitoring & Alerts
Key Metrics to Monitor:
- Sync job last successful run (alert if >24h)
- Number of records processed per sync (detect anomalies)
- ETL job execution time (alert if >15 min)
- Invalid collection_id extraction rate (alert if >5%)
- API error rate (alert if >10% of requests fail)
Recommended Alerts:
- Critical: Sync job failed 3 times consecutively
- Warning: ETL job took >10 minutes to complete
- Info: Daily metrics computed successfully
Dashboard Panels:
- Line chart: Daily shares over time
- Bar chart: Top 10 workflows by share count
- Table: Recent shares (last 100)
- KPI cards: Viral coefficient, share velocity, unique sharers
C. References
Neynar API Documentation:
- Feed API: https://docs.neynar.com/reference/feed-by-filter
- Rate Limits: https://docs.neynar.com/docs/rate-limits
Database Schema References:
- Star Schema Design: https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/
- Time-Series Indexing: https://www.postgresql.org/docs/current/indexes-types.html
Analytics Best Practices:
- Viral Coefficient Calculation: https://www.reforge.com/blog/viral-growth-framework
- Cohort Analysis: https://mixpanel.com/blog/cohort-analysis/
Conclusion
This Farcaster shares tracking and analytics system provides a scalable, maintainable foundation for understanding content virality and user engagement. By separating operational and analytical concerns, the architecture enables fast iteration on analytics while maintaining data integrity.
Key Success Factors:
- ✅ Incremental Implementation - Start simple, expand as needs evolve
- ✅ Clean Architecture - Operational vs. analytics separation
- ✅ Error Handling - Fail fast with descriptive errors (CLAUDE.md compliance)
- ✅ Performance - Pre-aggregated metrics for dashboard speed
- ✅ Scalability - Ready for warehouse migration as data grows
Next Steps:
- Review and approve this report
- Create git worktree:
feat/farcaster-shares-analytics - Write ADR in worktree with detailed technical decisions
- Begin Phase 1 implementation (operational data layer)
Document Version: 1.0 Last Updated: 2025-11-13 Author: EmProps Engineering Team
