Farcaster Shares Tracking & Analytics System

Date: 2025-11-13 Status: 📋 Planning Report Type: Technical Design & Implementation Plan Stakeholders: EmProps Studio Team, Analytics Team Related Systems: EmProps Studio, Neynar API, Analytics Infrastructure

Executive Summary

This report outlines the design and implementation plan for a Farcaster shares tracking and analytics system that will measure and analyze the virality of AI workflow generations shared on Farcaster. The system will enable data-driven understanding of:

Share velocity - How quickly content spreads
Viral coefficient - Ratio of shares to downstream generations
User engagement patterns - Early vs. late sharer behavior
Content virality - Which workflows generate the most organic sharing

Business Impact:

📊 Quantify virality - Measure ROI of social features
🎯 Optimize for sharing - Identify high-performing workflow patterns
👥 Understand user cohorts - Early adopters vs. mainstream behavior
📈 Track growth metrics - Share-driven user acquisition

Technical Approach:

Two-layer architecture: Operational data layer + Analytics layer
Star schema design for efficient analytical queries
Incremental ETL with historical backfill
Scalable foundation for future warehouse integration

Context
Problem Statement
Data Source
Architecture Design
Implementation Plan
Analytics Capabilities
Technical Specifications
ETL Strategy
Success Metrics
Future Considerations

Context

Current State

EmProps Studio Sharing:

Users create AI workflows and generate outputs (images, videos, etc.)
Generated content can be shared to Farcaster with embedded workflow URLs
URL pattern: https://emerge-mini-app.vercel.app/workflow/{collection_id}
Each workflow is uniquely identified by collection_id (UUID)

Current Gaps:

❌ No visibility into how often workflows are shared
❌ No virality metrics - Can't measure share → generation conversion
❌ No user cohort analysis - Can't identify power sharers vs. passive users
❌ No historical data - Missing ~10k historical shares on Farcaster

Opportunity:

✅ Neynar API provides complete feed of casts containing our URLs
✅ Historical data available via pagination
✅ Real-time feed updates for ongoing tracking
✅ Rich metadata: author FID, timestamps, cast text, engagement metrics

Business Goals

Measure Virality
- What percentage of generations get shared?
- What is the viral coefficient (shares → new generations)?
- How does share velocity correlate with quality/type of workflow?
Optimize Product
- Which workflows naturally drive sharing?
- What features correlate with higher share rates?
- How can we encourage more organic sharing?
Understand Users
- Who are the power sharers driving growth?
- Early adopter vs. mainstream sharer behavior
- Cohort analysis: retention and engagement patterns
Track Growth
- Share-driven user acquisition metrics
- Week-over-week share growth trends
- Network effects and compounding growth

Problem Statement

Core Problem

We need to answer strategic questions about content virality, but have no data infrastructure to support these queries.

Questions We Can't Answer Today:

How many times has each workflow been shared on Farcaster?
What is our viral coefficient (shares → downstream generations)?
Which users are the most active sharers?
How does share velocity change over time for a workflow?
What is the time lag between generation and first share?
Which cohorts of users have the highest share rates?
What content attributes correlate with high share rates?

Technical Challenges

Data Source Integration
- Neynar API requires pagination for historical data
- Rate limiting and API quota management
- Extracting collection_id from diverse URL formats
Data Volume
- ~10,000 historical shares to backfill
- Ongoing incremental updates (100s-1000s per day)
- Deduplication across backfill and ongoing syncs
Analytics Performance
- Complex analytical queries (viral coefficient, cohort analysis)
- Need pre-aggregated metrics for dashboard performance
- Star schema design for efficient time-series queries
Schema Evolution
- Start simple, expand as analytical needs grow
- Future warehouse migration (BigQuery/Snowflake)
- Maintain separation of operational vs. analytical data

Data Source

Neynar API - Feed Endpoint

API Endpoint:

bash

curl 'https://api.neynar.com/v2/farcaster/feed' \
  -H 'accept: application/json' \
  -H 'x-api-key: YOUR_API_KEY' \
  -G \
  --data-urlencode 'feed_type=filter' \
  --data-urlencode 'filter_type=embed_url' \
  --data-urlencode 'embed_url=https://emerge-mini-app.vercel.app/workflow/' \
  --data-urlencode 'limit=100' \
  --data-urlencode 'cursor=PAGINATION_CURSOR'

Response Structure:

json

{
  "casts": [
    {
      "hash": "0xabc123...",
      "timestamp": "2025-11-13T10:30:00Z",
      "author": {
        "fid": 12345,
        "username": "creator",
        "display_name": "Creative User"
      },
      "text": "Check out this amazing AI generation!",
      "embeds": [
        {
          "url": "https://emerge-mini-app.vercel.app/workflow/550e8400-e29b-41d4-a716-446655440000"
        }
      ],
      "reactions": {
        "likes": 42,
        "recasts": 7
      }
    }
  ],
  "next": {
    "cursor": "eyJwYWdlIjo..."
  }
}

Data Extraction:

Primary Key: cast_hash (unique identifier for each cast)
Timestamp: timestamp (when shared)
Author: author.fid (Farcaster user ID)
Workflow ID: Extract collection_id from embeds[].url pattern
Context: text (cast message for future sentiment analysis)
Engagement: reactions.likes, reactions.recasts (future use)

API Characteristics:

Pagination: Cursor-based, 100 results per page
Rate Limiting: 150 requests per minute (per API key)
Historical Access: Complete access to all historical casts
Real-time: Feed updates reflect new casts within minutes
Filtering: Pre-filtered by our embed URL (efficient)

Collection ID Extraction

URL Pattern Variations:

https://emerge-mini-app.vercel.app/workflow/550e8400-e29b-41d4-a716-446655440000
https://emerge-mini-app.vercel.app/workflow/550e8400-e29b-41d4-a716-446655440000?param=value
https://emerge-mini-app.vercel.app/workflow/550e8400-e29b-41d4-a716-446655440000/view

Extraction Logic:

typescript

function extractCollectionId(url: string): string | null {
  const match = url.match(/\/workflow\/([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})/i);
  return match ? match[1] : null;
}

Error Handling:

Invalid URLs → Log and skip (don't fail entire batch)
Multiple embeds → Process all matching URLs
Malformed UUIDs → Log for investigation

Architecture Design

Two-Layer Architecture

┌─────────────────────────────────────────────────────┐
│                  Neynar API Feed                    │
│         (External Data Source - Farcaster)          │
└──────────────────┬──────────────────────────────────┘
                   │
                   │ Backfill (one-time)
                   │ + Sync (periodic)
                   ↓
┌─────────────────────────────────────────────────────┐
│             OPERATIONAL DATA LAYER                  │
│                                                     │
│  ┌───────────────────────────────────────────┐    │
│  │  miniapp_share                            │    │
│  │  - cast_hash (PK)                         │    │
│  │  - shared_at                              │    │
│  │  - author_fid                             │    │
│  │  - collection_id                          │    │
│  │  - cast_text                              │    │
│  │  - created_at                             │    │
│  └───────────────────────────────────────────┘    │
│                                                     │
│  Purpose: Raw share events (insert-only)          │
│  Updates: Incremental syncs via cron job          │
│  Retention: Permanent (source of truth)            │
└──────────────────┬──────────────────────────────────┘
                   │
                   │ Daily ETL
                   │ (Transform + Aggregate)
                   ↓
┌─────────────────────────────────────────────────────┐
│              ANALYTICS LAYER (Star Schema)          │
│                                                     │
│  ┌───────────────────────────────────────────┐    │
│  │  analytics.share_events (Fact Table)      │    │
│  │  - event_id (PK)                          │    │
│  │  - event_date                             │    │
│  │  - shared_at                              │    │
│  │  - user_fid                               │    │
│  │  - workflow_id (FK → dim_workflow)        │    │
│  │  - generation_id (FK → farcaster_collection)│ │
│  │  - is_first_share                         │    │
│  │  - days_since_creation                    │    │
│  └───────────────────────────────────────────┘    │
│                                                     │
│  ┌───────────────────────────────────────────┐    │
│  │  analytics.dim_workflow (Dimension)       │    │
│  │  - workflow_id (PK)                       │    │
│  │  - workflow_name                          │    │
│  │  - total_generations                      │    │
│  │  - total_shares                           │    │
│  │  - share_rate (shares / generations)      │    │
│  │  - viral_coefficient                      │    │
│  │  - first_shared_at                        │    │
│  │  - last_shared_at                         │    │
│  └───────────────────────────────────────────┘    │
│                                                     │
│  ┌───────────────────────────────────────────┐    │
│  │  analytics.dim_user (Dimension)           │    │
│  │  - user_fid (PK)                          │    │
│  │  - first_share_date                       │    │
│  │  - total_shares                           │    │
│  │  - total_workflows_shared                 │    │
│  │  - cohort (YYYY-MM)                       │    │
│  │  - user_type (power/casual/one-time)      │    │
│  └───────────────────────────────────────────┘    │
│                                                     │
│  ┌───────────────────────────────────────────┐    │
│  │  analytics.dim_date (Dimension)           │    │
│  │  - date_key (PK) YYYYMMDD                 │    │
│  │  - full_date                              │    │
│  │  - year, quarter, month, week             │    │
│  │  - day_of_week, is_weekend                │    │
│  └───────────────────────────────────────────┘    │
│                                                     │
│  ┌───────────────────────────────────────────┐    │
│  │  analytics.daily_metrics (Aggregates)     │    │
│  │  - date (PK)                              │    │
│  │  - total_shares                           │    │
│  │  - unique_sharers                         │    │
│  │  - unique_workflows                       │    │
│  │  - avg_shares_per_workflow                │    │
│  │  - viral_coefficient_7d                   │    │
│  │  - share_velocity_wow                     │    │
│  └───────────────────────────────────────────┘    │
│                                                     │
│  Purpose: Optimized for analytical queries         │
│  Updates: Daily ETL from operational layer         │
│  Retention: Permanent (with historical snapshots)  │
└─────────────────────────────────────────────────────┘

Design Rationale

Why Two Layers?

Operational Layer (miniapp_share)
- Simple, append-only table
- Source of truth for raw share events
- Fast writes during sync jobs
- No complex transformations (just extract collection_id)
Analytics Layer (analytics.*)
- Star schema optimized for queries
- Pre-aggregated metrics (fast dashboards)
- Denormalized for query performance
- Clean separation of concerns

Benefits:

✅ Reliability: Operational data never modified (audit trail)
✅ Performance: Analytics queries don't impact sync jobs
✅ Flexibility: Can rebuild analytics layer without re-fetching from API
✅ Scalability: Easy to migrate analytics to warehouse (BigQuery/Snowflake)
✅ Incremental: Build analytics layer piece-by-piece as needs evolve

Implementation Plan

Phase 1: Operational Data Layer (Week 1)

Objective: Capture all share events in operational table

Tasks:

✅ Create miniapp_share table schema
✅ Build Neynar API client with pagination support
✅ Implement collection_id extraction logic
✅ Create backfill script (one-time historical import)
✅ Test with sample data (100 records)
✅ Run full backfill (~10k records)
✅ Validate data quality (no duplicates, valid UUIDs)

Deliverables:

Database migration for miniapp_share table
NeynarClient TypeScript class
backfill-farcaster-shares.ts script
Unit tests for extraction and deduplication logic

Success Criteria:

All historical shares imported successfully
No duplicate cast_hash entries
100% valid collection_id UUIDs
Script handles API rate limiting gracefully

Phase 2: Ongoing Sync Job (Week 1-2)

Objective: Keep operational data up-to-date automatically

Tasks:

✅ Create incremental sync job (fetch only new shares)
✅ Implement cursor-based pagination tracking
✅ Setup cron job (every 6-12 hours)
✅ Add monitoring/alerting for sync failures
✅ Test sync job with mock data
✅ Deploy to production

Deliverables:

sync-farcaster-shares.ts cron job
Cursor tracking mechanism (Redis or DB)
Error handling and retry logic
Monitoring dashboard for sync health

Success Criteria:

Sync job runs reliably every 6-12 hours
New shares appear within 12 hours of posting
Zero data loss during syncs
Alerts fire on API errors or sync failures

Phase 3: Analytics Schema - Fact Table (Week 2)

Objective: Create core analytics fact table

Tasks:

✅ Design analytics.share_events schema
✅ Create database migration
✅ Build ETL script (operational → analytics)
✅ Add derived fields (is_first_share, days_since_creation)
✅ Create indexes for common query patterns
✅ Test with operational data

Deliverables:

analytics.share_events table
ETL script: transform-share-events.ts
Indexes on: event_date, user_fid, workflow_id
Unit tests for transformations

Success Criteria:

All operational shares transformed successfully
is_first_share flag accurate
days_since_creation calculated correctly
Queries return results in <100ms

Phase 4: Analytics Schema - Dimensions (Week 2-3)

Objective: Add dimension tables for rich analytics

Tasks:

✅ Create analytics.dim_workflow table
✅ Create analytics.dim_user table
✅ Create analytics.dim_date table
✅ Build aggregation logic for dimensions
✅ Create materialized views or scheduled updates
✅ Test join performance

Deliverables:

Dimension tables with pre-aggregated metrics
ETL scripts for each dimension
Documentation on dimension attributes

Success Criteria:

Workflow dimension shows accurate share counts
User dimension correctly identifies cohorts
Date dimension enables efficient time-series queries

Phase 5: Pre-Aggregated Metrics (Week 3)

Objective: Fast dashboard queries via pre-aggregation

Tasks:

✅ Create analytics.daily_metrics table
✅ Build daily aggregation job
✅ Calculate viral coefficient (7-day rolling)
✅ Calculate share velocity (week-over-week)
✅ Test aggregation accuracy
✅ Schedule daily ETL job

Deliverables:

analytics.daily_metrics table
Daily ETL job: compute-daily-metrics.ts
Scheduled job (runs daily at 2 AM UTC)

Success Criteria:

Daily metrics computed within 5 minutes
Viral coefficient matches manual calculations
Share velocity trends align with operational data

Phase 6: Analytics Queries (Week 3-4)

Objective: Enable key analytical insights

Tasks:

✅ Write SQL queries for viral coefficient
✅ Write SQL queries for share velocity
✅ Write SQL queries for cohort analysis
✅ Create query library/module
✅ Build API endpoints for dashboard
✅ Test query performance under load

Deliverables:

analytics-queries.ts module
REST API endpoints: /api/analytics/viral-coefficient, etc.
Query performance documentation

Success Criteria:

All key queries return in <500ms
API endpoints ready for dashboard integration
Queries scale to 100k+ share events

Analytics Capabilities

Key Metrics

1. Viral Coefficient

Definition: For each shared workflow, how many new generations are created?

Formula:

Viral Coefficient = (New Generations from Shared Workflows) / (Total Shares)

Example Query:

sql

WITH shared_workflows AS (
  SELECT DISTINCT collection_id
  FROM analytics.share_events
  WHERE shared_at >= NOW() - INTERVAL '7 days'
),
subsequent_generations AS (
  SELECT COUNT(*) as generation_count
  FROM farcaster_collection fc
  WHERE fc.created_at >= (
    SELECT MIN(shared_at) FROM analytics.share_events se
    WHERE se.generation_id = fc.id
  )
  AND fc.workflow_id IN (SELECT collection_id FROM shared_workflows)
)
SELECT
  (SELECT generation_count FROM subsequent_generations)::float /
  (SELECT COUNT(*) FROM analytics.share_events WHERE shared_at >= NOW() - INTERVAL '7 days')
  AS viral_coefficient_7d;

Insight: Viral coefficient > 1.0 indicates compounding growth

Definition: How quickly is sharing activity growing week-over-week?

Formula:

Share Velocity = ((This Week Shares - Last Week Shares) / Last Week Shares) * 100

Example Query:

sql

WITH weekly_shares AS (
  SELECT
    DATE_TRUNC('week', shared_at) as week,
    COUNT(*) as share_count
  FROM analytics.share_events
  GROUP BY DATE_TRUNC('week', shared_at)
  ORDER BY week DESC
  LIMIT 2
)
SELECT
  ((current.share_count - previous.share_count)::float / previous.share_count * 100) as velocity_pct
FROM
  (SELECT * FROM weekly_shares LIMIT 1 OFFSET 0) current,
  (SELECT * FROM weekly_shares LIMIT 1 OFFSET 1) previous;

Insight: Positive velocity indicates accelerating growth

3. Cohort Analysis

Definition: Compare sharing behavior across user cohorts (grouped by first share date)

Example Query:

sql

SELECT
  du.cohort,
  COUNT(DISTINCT du.user_fid) as cohort_size,
  AVG(du.total_shares) as avg_shares_per_user,
  SUM(CASE WHEN du.total_shares >= 10 THEN 1 ELSE 0 END)::float / COUNT(*) as power_user_pct
FROM analytics.dim_user du
GROUP BY du.cohort
ORDER BY du.cohort DESC;

Insight: Identify which cohorts have highest engagement and retention

Definition: What percentage of workflow generations get shared?

Example Query:

sql

SELECT
  dw.workflow_id,
  dw.workflow_name,
  dw.total_generations,
  dw.total_shares,
  (dw.total_shares::float / NULLIF(dw.total_generations, 0) * 100) as share_rate_pct
FROM analytics.dim_workflow dw
WHERE dw.total_generations >= 10
ORDER BY share_rate_pct DESC
LIMIT 20;

Insight: High share rate workflows are naturally viral

Definition: How long between generation and first share?

Example Query:

sql

SELECT
  AVG(EXTRACT(EPOCH FROM (se.shared_at - fc.created_at)) / 3600) as avg_hours_to_share,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY se.shared_at - fc.created_at) as median_time_to_share
FROM analytics.share_events se
JOIN farcaster_collection fc ON se.generation_id = fc.id
WHERE se.is_first_share = true;

Insight: Fast time-to-share indicates high-quality/exciting content

Dashboard Queries

Daily Overview:

sql

SELECT * FROM analytics.daily_metrics
ORDER BY date DESC
LIMIT 30;

Top Sharers (Power Users):

sql

SELECT
  user_fid,
  total_shares,
  total_workflows_shared,
  first_share_date,
  user_type
FROM analytics.dim_user
WHERE user_type = 'power'
ORDER BY total_shares DESC
LIMIT 50;

Trending Workflows (Last 7 Days):

sql

SELECT
  dw.workflow_id,
  dw.workflow_name,
  COUNT(se.event_id) as shares_last_7d,
  dw.share_rate
FROM analytics.share_events se
JOIN analytics.dim_workflow dw ON se.workflow_id = dw.workflow_id
WHERE se.event_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY dw.workflow_id, dw.workflow_name, dw.share_rate
ORDER BY shares_last_7d DESC
LIMIT 20;

Technical Specifications

Database Schema

Operational Layer

sql

-- Raw share events (source of truth)
CREATE TABLE miniapp_share (
  cast_hash TEXT PRIMARY KEY,
  shared_at TIMESTAMP NOT NULL,
  author_fid INTEGER NOT NULL,
  collection_id UUID NOT NULL,
  cast_text TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_miniapp_share_collection ON miniapp_share(collection_id);
CREATE INDEX idx_miniapp_share_author ON miniapp_share(author_fid);
CREATE INDEX idx_miniapp_share_date ON miniapp_share(shared_at);

Analytics Layer - Fact Table

sql

CREATE SCHEMA IF NOT EXISTS analytics;

-- Core fact table for share events
CREATE TABLE analytics.share_events (
  event_id BIGSERIAL PRIMARY KEY,
  event_date DATE NOT NULL,
  shared_at TIMESTAMP NOT NULL,
  user_fid INTEGER NOT NULL,
  workflow_id UUID NOT NULL,
  generation_id UUID,  -- References farcaster_collection.id
  is_first_share BOOLEAN DEFAULT false,
  days_since_creation INTEGER,
  cast_hash TEXT UNIQUE NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_share_events_date ON analytics.share_events(event_date);
CREATE INDEX idx_share_events_user ON analytics.share_events(user_fid);
CREATE INDEX idx_share_events_workflow ON analytics.share_events(workflow_id);
CREATE INDEX idx_share_events_generation ON analytics.share_events(generation_id);

Analytics Layer - Dimension Tables

sql

-- Workflow dimension (aggregated metrics)
CREATE TABLE analytics.dim_workflow (
  workflow_id UUID PRIMARY KEY,
  workflow_name TEXT,
  total_generations INTEGER DEFAULT 0,
  total_shares INTEGER DEFAULT 0,
  unique_sharers INTEGER DEFAULT 0,
  share_rate NUMERIC(5,2),  -- shares / generations * 100
  viral_coefficient NUMERIC(5,2),
  first_shared_at TIMESTAMP,
  last_shared_at TIMESTAMP,
  updated_at TIMESTAMP DEFAULT NOW()
);

-- User dimension (aggregated metrics)
CREATE TABLE analytics.dim_user (
  user_fid INTEGER PRIMARY KEY,
  first_share_date DATE NOT NULL,
  total_shares INTEGER DEFAULT 0,
  total_workflows_shared INTEGER DEFAULT 0,
  cohort VARCHAR(7),  -- YYYY-MM format
  user_type VARCHAR(20),  -- 'power', 'casual', 'one-time'
  updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_dim_user_cohort ON analytics.dim_user(cohort);
CREATE INDEX idx_dim_user_type ON analytics.dim_user(user_type);

-- Date dimension (for time-series queries)
CREATE TABLE analytics.dim_date (
  date_key INTEGER PRIMARY KEY,  -- YYYYMMDD format
  full_date DATE NOT NULL UNIQUE,
  year INTEGER NOT NULL,
  quarter INTEGER NOT NULL,
  month INTEGER NOT NULL,
  week INTEGER NOT NULL,
  day_of_month INTEGER NOT NULL,
  day_of_week INTEGER NOT NULL,
  day_name VARCHAR(10) NOT NULL,
  is_weekend BOOLEAN NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_dim_date_full ON analytics.dim_date(full_date);
CREATE INDEX idx_dim_date_month ON analytics.dim_date(year, month);

Analytics Layer - Pre-Aggregated Metrics

sql

-- Daily rollup metrics (for fast dashboard queries)
CREATE TABLE analytics.daily_metrics (
  date DATE PRIMARY KEY,
  total_shares INTEGER DEFAULT 0,
  unique_sharers INTEGER DEFAULT 0,
  unique_workflows INTEGER DEFAULT 0,
  new_sharers INTEGER DEFAULT 0,  -- First-time sharers
  avg_shares_per_workflow NUMERIC(10,2),
  avg_shares_per_user NUMERIC(10,2),
  viral_coefficient_7d NUMERIC(5,2),
  share_velocity_wow NUMERIC(5,2),  -- Week-over-week % change
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_daily_metrics_date ON analytics.daily_metrics(date DESC);

API Client Implementation

Neynar Client Interface:

typescript

interface NeynarClientConfig {
  apiKey: string;
  baseUrl?: string;
  rateLimit?: number; // requests per minute
}

interface FarcasterCast {
  hash: string;
  timestamp: string;
  author: {
    fid: number;
    username: string;
    display_name: string;
  };
  text: string;
  embeds: Array<{ url: string }>;
  reactions: {
    likes: number;
    recasts: number;
  };
}

interface FeedResponse {
  casts: FarcasterCast[];
  next?: {
    cursor: string;
  };
}

class NeynarClient {
  constructor(config: NeynarClientConfig) {}

  async fetchFeed(params: {
    embedUrl: string;
    limit?: number;
    cursor?: string;
  }): Promise<FeedResponse> {}

  async *paginateFeed(embedUrl: string): AsyncGenerator<FarcasterCast[]> {}
}

Extraction Utility:

typescript

interface ShareEvent {
  castHash: string;
  sharedAt: Date;
  authorFid: number;
  collectionId: string;
  castText: string;
}

function extractShareEvents(casts: FarcasterCast[]): ShareEvent[] {
  return casts
    .flatMap(cast => {
      return cast.embeds
        .map(embed => extractCollectionId(embed.url))
        .filter((id): id is string => id !== null)
        .map(collectionId => ({
          castHash: cast.hash,
          sharedAt: new Date(cast.timestamp),
          authorFid: cast.author.fid,
          collectionId,
          castText: cast.text
        }));
    });
}

function extractCollectionId(url: string): string | null {
  const match = url.match(/\/workflow\/([0-9a-f-]{36})/i);
  return match ? match[1] : null;
}

ETL Strategy

Data Flow

┌──────────────┐
│  Neynar API  │
└──────┬───────┘
       │
       │ Backfill (one-time): Paginate all historical
       │ Sync (periodic): Fetch only new casts
       │
       ↓
┌──────────────────────────────────────────┐
│  OPERATIONAL LAYER: miniapp_share        │
│  - Raw, append-only                      │
│  - cast_hash deduplication               │
│  - Minimal transformation                │
└──────┬───────────────────────────────────┘
       │
       │ Daily ETL (scheduled job)
       │
       ↓
┌──────────────────────────────────────────┐
│  ANALYTICS LAYER: share_events           │
│  - Denormalized for query performance    │
│  - Derived fields (is_first_share, etc.) │
│  - Joins with farcaster_collection       │
└──────┬───────────────────────────────────┘
       │
       │ Daily Aggregation
       │
       ↓
┌──────────────────────────────────────────┐
│  DIMENSIONS & METRICS                    │
│  - dim_workflow (workflow aggregates)    │
│  - dim_user (user aggregates)            │
│  - daily_metrics (pre-computed KPIs)     │
└──────────────────────────────────────────┘

ETL Jobs

1. Backfill Job (One-Time)

Purpose: Import all historical shares from Neynar API

Process:

typescript

async function backfillFarcasterShares() {
  const client = new NeynarClient({ apiKey: process.env.NEYNAR_API_KEY });
  const embedUrl = 'https://emerge-mini-app.vercel.app/workflow/';

  let totalProcessed = 0;
  let totalInserted = 0;

  for await (const castBatch of client.paginateFeed(embedUrl)) {
    const shareEvents = extractShareEvents(castBatch);

    // Upsert to handle duplicates gracefully
    const result = await db.query(`
      INSERT INTO miniapp_share (cast_hash, shared_at, author_fid, collection_id, cast_text)
      VALUES ${shareEvents.map((_, i) => `($${i*5+1}, $${i*5+2}, $${i*5+3}, $${i*5+4}, $${i*5+5})`).join(', ')}
      ON CONFLICT (cast_hash) DO NOTHING
      RETURNING cast_hash
    `, shareEvents.flatMap(e => [e.castHash, e.sharedAt, e.authorFid, e.collectionId, e.castText]));

    totalProcessed += shareEvents.length;
    totalInserted += result.rowCount;

    console.log(`Processed ${totalProcessed}, inserted ${totalInserted}`);
  }
}

Execution:

bash

pnpm run backfill:farcaster-shares

Expected Duration: ~15-20 minutes for 10k shares (with rate limiting)

2. Sync Job (Scheduled)

Purpose: Fetch new shares periodically (every 6-12 hours)

Process:

typescript

async function syncFarcasterShares() {
  // Get last synced cursor from Redis/DB
  const lastCursor = await redis.get('neynar:last_cursor');

  const client = new NeynarClient({ apiKey: process.env.NEYNAR_API_KEY });
  const response = await client.fetchFeed({
    embedUrl: 'https://emerge-mini-app.vercel.app/workflow/',
    limit: 100,
    cursor: lastCursor || undefined
  });

  const shareEvents = extractShareEvents(response.casts);

  // Insert new shares
  if (shareEvents.length > 0) {
    await insertShareEvents(shareEvents);
  }

  // Save new cursor for next sync
  if (response.next?.cursor) {
    await redis.set('neynar:last_cursor', response.next.cursor);
  }

  return { processed: shareEvents.length };
}

Cron Schedule:

yaml

# Every 12 hours
schedule: "0 */12 * * *"

3. Analytics Transform Job (Daily)

Purpose: Transform operational data into analytics layer

Process:

typescript

async function transformShareEvents() {
  // 1. Insert new share events
  await db.query(`
    INSERT INTO analytics.share_events (
      event_date, shared_at, user_fid, workflow_id, generation_id,
      is_first_share, days_since_creation, cast_hash
    )
    SELECT
      DATE(ms.shared_at) as event_date,
      ms.shared_at,
      ms.author_fid as user_fid,
      ms.collection_id as workflow_id,
      fc.id as generation_id,
      (NOT EXISTS (
        SELECT 1 FROM analytics.share_events se2
        WHERE se2.generation_id = fc.id AND se2.shared_at < ms.shared_at
      )) as is_first_share,
      DATE_PART('day', ms.shared_at - fc.created_at)::INTEGER as days_since_creation,
      ms.cast_hash
    FROM miniapp_share ms
    LEFT JOIN farcaster_collection fc ON fc.id = ms.collection_id
    WHERE NOT EXISTS (
      SELECT 1 FROM analytics.share_events se
      WHERE se.cast_hash = ms.cast_hash
    )
  `);

  // 2. Update workflow dimension
  await updateWorkflowDimension();

  // 3. Update user dimension
  await updateUserDimension();

  // 4. Compute daily metrics
  await computeDailyMetrics();
}

Execution:

yaml

# Daily at 2 AM UTC
schedule: "0 2 * * *"

4. Aggregation Job (Daily)

Purpose: Update dimension tables and pre-computed metrics

Process:

typescript

async function updateWorkflowDimension() {
  await db.query(`
    INSERT INTO analytics.dim_workflow (
      workflow_id, total_shares, unique_sharers, first_shared_at, last_shared_at
    )
    SELECT
      workflow_id,
      COUNT(*) as total_shares,
      COUNT(DISTINCT user_fid) as unique_sharers,
      MIN(shared_at) as first_shared_at,
      MAX(shared_at) as last_shared_at
    FROM analytics.share_events
    GROUP BY workflow_id
    ON CONFLICT (workflow_id) DO UPDATE SET
      total_shares = EXCLUDED.total_shares,
      unique_sharers = EXCLUDED.unique_sharers,
      last_shared_at = EXCLUDED.last_shared_at,
      updated_at = NOW()
  `);

  // Update share_rate (requires join with generation counts)
  await db.query(`
    UPDATE analytics.dim_workflow dw
    SET
      total_generations = gc.generation_count,
      share_rate = (dw.total_shares::float / NULLIF(gc.generation_count, 0) * 100)
    FROM (
      SELECT workflow_id, COUNT(*) as generation_count
      FROM farcaster_collection
      GROUP BY workflow_id
    ) gc
    WHERE dw.workflow_id = gc.workflow_id
  `);
}

Error Handling

Principles:

Fail Fast, Log Descriptive Errors (per CLAUDE.md guidelines)
Retry with Exponential Backoff for transient API errors
Dead Letter Queue for unrecoverable errors
Alerting on sync failures > 24 hours

Error Scenarios:

Error Type	Handling Strategy
API Rate Limit	Wait + Retry with backoff
Invalid Collection ID	Log + Skip individual record
Network Timeout	Retry up to 3 times
Database Constraint Violation	Log + Skip (likely duplicate)
Empty API Response	Log + Continue (no new data)
Sync Job Failure	Alert after 2 consecutive failures

Example Error Handling:

typescript

async function fetchWithRetry<T>(
  fn: () => Promise<T>,
  maxRetries = 3
): Promise<T> {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === maxRetries) {
        throw new Error(
          `Failed after ${maxRetries} attempts: ${error.message}`
        );
      }

      const delay = Math.pow(2, attempt) * 1000; // Exponential backoff
      console.warn(`Attempt ${attempt} failed, retrying in ${delay}ms...`);
      await sleep(delay);
    }
  }

  throw new Error('Unreachable');
}

Success Metrics

System Health Metrics

Data Quality:

✅ Zero duplicate cast_hash in operational table
✅ 100% valid UUIDs for collection_id
✅ <1% skipped records due to extraction errors
✅ Daily sync latency <12 hours (new shares appear same day)

Performance:

✅ Backfill completes in <30 minutes for 10k records
✅ Sync job completes in <5 minutes for incremental updates
✅ Analytics queries return in <500ms (95th percentile)
✅ Daily ETL completes in <10 minutes

Reliability:

✅ Sync job success rate >99.5% (allow occasional API errors)
✅ Zero data loss (all shares eventually captured)
✅ ETL failures alert within 1 hour

Business Metrics

Virality KPIs:

📊 Viral Coefficient - Target: >0.5 (50% of shares lead to new generations)
📊 Share Rate - Target: >10% (10%+ of generations get shared)
📊 Share Velocity - Track week-over-week growth trends
📊 Power User Ratio - Identify top 10% of sharers

User Engagement:

📊 Time-to-Share - Median time between generation and share
📊 Multi-Share Rate - % of workflows shared by 2+ users
📊 Cohort Retention - Do early sharers continue sharing?

Product Insights:

📊 High-Share Workflows - Which templates naturally go viral?
📊 Share Text Analysis - What messaging drives engagement?
📊 Cross-Reference Analysis - Generation volume vs. share volume correlation

Future Considerations

Phase 2 Enhancements (Months 2-3)

1. Sentiment Analysis

Use AI to analyze cast_text for sentiment (positive/negative/neutral)
Correlate sentiment with share velocity and viral coefficient
Identify what messaging patterns drive virality

2. User Profile Enrichment

Fetch additional Neynar user metadata (follower count, bio, etc.)
Segment users by influence (micro vs. macro influencers)
Track share reach (author's follower count × engagement)

3. Cross-Platform Tracking

Track shares on Twitter/X (if embeds contain workflow URLs)
Track shares in Discord/Telegram (via webhook integrations)
Unified cross-platform viral coefficient

4. Real-Time Analytics

Stream processing for live viral coefficient updates
Real-time alerts for breakout viral workflows
Live dashboard with WebSocket updates

Phase 3 Warehouse Migration (Months 4-6)

Goal: Scale analytics to handle millions of shares

Architecture:

PostgreSQL (operational) → ETL → BigQuery/Snowflake (analytics)
                                      ↓
                                  Looker/Metabase Dashboard

Benefits:

Separate operational writes from analytical queries
Leverage warehouse-specific optimizations (columnar storage, partitioning)
Enable data science workflows (ML models for predicting virality)
Support 10x+ data volume growth

Migration Strategy:

Incremental: Start with daily_metrics table only
Dual-write: Write to both Postgres and warehouse during transition
Cutover: Switch dashboards to warehouse once validated
Deprecate: Eventually remove analytics schema from Postgres

Advanced Analytics

1. Predictive Modeling

ML model to predict viral coefficient from workflow features
Recommendations: "This workflow type historically shares well"
A/B testing framework for share feature experiments

2. Network Analysis

Graph analysis of share chains (user A → user B → user C)
Identify super-spreaders and viral pathways
Community detection among sharers

3. Attribution Tracking

Track conversions from share → view → generation
Measure ROI of social features
Attribution windows (1-day, 7-day, 30-day)

Appendices

A. Example Queries Library

Find Top 10 Most Shared Workflows (All Time):

sql

SELECT
  dw.workflow_id,
  dw.workflow_name,
  dw.total_shares,
  dw.unique_sharers,
  dw.share_rate
FROM analytics.dim_workflow dw
ORDER BY dw.total_shares DESC
LIMIT 10;

Cohort Analysis - Share Retention:

sql

SELECT
  du.cohort,
  COUNT(DISTINCT du.user_fid) as cohort_size,
  COUNT(DISTINCT CASE WHEN se.event_date >= CURRENT_DATE - 30 THEN du.user_fid END) as active_last_30d,
  (COUNT(DISTINCT CASE WHEN se.event_date >= CURRENT_DATE - 30 THEN du.user_fid END)::float /
   COUNT(DISTINCT du.user_fid) * 100) as retention_pct
FROM analytics.dim_user du
LEFT JOIN analytics.share_events se ON du.user_fid = se.user_fid
GROUP BY du.cohort
ORDER BY du.cohort DESC;

Weekly Share Velocity Trend:

sql

WITH weekly_shares AS (
  SELECT
    DATE_TRUNC('week', event_date) as week,
    COUNT(*) as share_count
  FROM analytics.share_events
  GROUP BY DATE_TRUNC('week', event_date)
)
SELECT
  week,
  share_count,
  LAG(share_count) OVER (ORDER BY week) as prev_week,
  ((share_count - LAG(share_count) OVER (ORDER BY week))::float /
   NULLIF(LAG(share_count) OVER (ORDER BY week), 0) * 100) as velocity_wow_pct
FROM weekly_shares
ORDER BY week DESC;

B. Monitoring & Alerts

Key Metrics to Monitor:

Sync job last successful run (alert if >24h)
Number of records processed per sync (detect anomalies)
ETL job execution time (alert if >15 min)
Invalid collection_id extraction rate (alert if >5%)
API error rate (alert if >10% of requests fail)

Recommended Alerts:

Critical: Sync job failed 3 times consecutively
Warning: ETL job took >10 minutes to complete
Info: Daily metrics computed successfully

Dashboard Panels:

Line chart: Daily shares over time
Bar chart: Top 10 workflows by share count
Table: Recent shares (last 100)
KPI cards: Viral coefficient, share velocity, unique sharers

C. References

Neynar API Documentation:

Feed API: https://docs.neynar.com/reference/feed-by-filter
Rate Limits: https://docs.neynar.com/docs/rate-limits

Database Schema References:

Star Schema Design: https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/
Time-Series Indexing: https://www.postgresql.org/docs/current/indexes-types.html

Analytics Best Practices:

Viral Coefficient Calculation: https://www.reforge.com/blog/viral-growth-framework
Cohort Analysis: https://mixpanel.com/blog/cohort-analysis/

Conclusion

This Farcaster shares tracking and analytics system provides a scalable, maintainable foundation for understanding content virality and user engagement. By separating operational and analytical concerns, the architecture enables fast iteration on analytics while maintaining data integrity.

Key Success Factors:

✅ Incremental Implementation - Start simple, expand as needs evolve
✅ Clean Architecture - Operational vs. analytics separation
✅ Error Handling - Fail fast with descriptive errors (CLAUDE.md compliance)
✅ Performance - Pre-aggregated metrics for dashboard speed
✅ Scalability - Ready for warehouse migration as data grows

Next Steps:

Review and approve this report
Create git worktree: feat/farcaster-shares-analytics
Write ADR in worktree with detailed technical decisions
Begin Phase 1 implementation (operational data layer)

Document Version: 1.0 Last Updated: 2025-11-13 Author: EmProps Engineering Team

Farcaster Shares Tracking & Analytics System ​

Executive Summary ​

Table of Contents ​

Context ​

Current State ​

Business Goals ​

Problem Statement ​

Core Problem ​

Technical Challenges ​

Data Source ​

Neynar API - Feed Endpoint ​

Collection ID Extraction ​

Architecture Design ​

Two-Layer Architecture ​

Design Rationale ​

Implementation Plan ​

Phase 1: Operational Data Layer (Week 1) ​

Phase 2: Ongoing Sync Job (Week 1-2) ​

Phase 3: Analytics Schema - Fact Table (Week 2) ​

Phase 4: Analytics Schema - Dimensions (Week 2-3) ​

Phase 5: Pre-Aggregated Metrics (Week 3) ​

Phase 6: Analytics Queries (Week 3-4) ​

Analytics Capabilities ​

Key Metrics ​

1. Viral Coefficient ​

2. Share Velocity ​

3. Cohort Analysis ​

4. Workflow Share Rate ​

5. Time-to-Share Analysis ​

Dashboard Queries ​

Technical Specifications ​

Database Schema ​

Operational Layer ​

Analytics Layer - Fact Table ​

Analytics Layer - Dimension Tables ​

Analytics Layer - Pre-Aggregated Metrics ​

API Client Implementation ​

ETL Strategy ​

Data Flow ​

ETL Jobs ​

1. Backfill Job (One-Time) ​

2. Sync Job (Scheduled) ​

3. Analytics Transform Job (Daily) ​

4. Aggregation Job (Daily) ​

Error Handling ​

Success Metrics ​

System Health Metrics ​

Business Metrics ​

Future Considerations ​

Phase 2 Enhancements (Months 2-3) ​

Phase 3 Warehouse Migration (Months 4-6) ​

Advanced Analytics ​

Appendices ​

A. Example Queries Library ​

B. Monitoring & Alerts ​

C. References ​

Conclusion ​

Farcaster Shares Tracking & Analytics System

Executive Summary

Table of Contents

Context

Current State

Business Goals

Problem Statement

Core Problem

Technical Challenges

Data Source

Neynar API - Feed Endpoint

Collection ID Extraction

Architecture Design

Two-Layer Architecture

Design Rationale

Implementation Plan

Phase 1: Operational Data Layer (Week 1)

Phase 2: Ongoing Sync Job (Week 1-2)

Phase 3: Analytics Schema - Fact Table (Week 2)

Phase 4: Analytics Schema - Dimensions (Week 2-3)

Phase 5: Pre-Aggregated Metrics (Week 3)

Phase 6: Analytics Queries (Week 3-4)

Analytics Capabilities

Key Metrics

1. Viral Coefficient

2. Share Velocity

3. Cohort Analysis

4. Workflow Share Rate

5. Time-to-Share Analysis

Dashboard Queries

Technical Specifications

Database Schema

Operational Layer

Analytics Layer - Fact Table

Analytics Layer - Dimension Tables

Analytics Layer - Pre-Aggregated Metrics

API Client Implementation

ETL Strategy

Data Flow

ETL Jobs

1. Backfill Job (One-Time)

2. Sync Job (Scheduled)

3. Analytics Transform Job (Daily)

4. Aggregation Job (Daily)

Error Handling

Success Metrics

System Health Metrics

Business Metrics

Future Considerations

Phase 2 Enhancements (Months 2-3)

Phase 3 Warehouse Migration (Months 4-6)

Advanced Analytics

Appendices

A. Example Queries Library

B. Monitoring & Alerts

C. References

Conclusion