Skip to content

OpenTelemetry Trace Registry

Purpose: This document is the source of truth for what traces currently exist in the system. Use this to understand what's instrumented and where traces fit in the request hierarchy.

Last Updated: 2025-01-29 (Added workflow.execute root span)


Current Request Flow & Traces


Active Traces

1. Service Heartbeat (All Services)

Status: ✅ Active Span Name: service.heartbeatFrequency: Every 15 seconds Services: emprops-api, q-job-api, q-worker, q-webhook, telemetry-collector

Attributes:

  • service.status: "running"
  • service.uptime_seconds: Time since service start

Purpose: Service map visibility in Dash0, uptime tracking

Implementation: Automatic via EmpTelemetryClientLocation: packages/telemetry/src/index.ts:449-463


2. Workflow Execute (ROOT PARENT)

Status: ✅ Active Span Name: workflow.executeService: emprops-apiTrigger: POST /workflows/:id/test

Attributes:

  • http.method: "POST"
  • http.route: "/workflows/:id/test"
  • http.status_code: Response status (200, 400, 500)
  • workflow.id: Workflow identifier
  • workflow.name: Workflow name
  • workflow.type: Workflow type (e.g., "comfy_workflow", "direct_job")
  • error: true (if request failed)
  • error.message: Error message (if request failed)

Parent: None (root span) Children: job.submit (when workflow runner submits jobs to Job API)

Implementation: Manual via telemetryClient.withSpan()Location: apps/emprops-api/src/routes/workflows/[id]/test/index.ts:31

Testing:

bash
# Start EmProps API
pnpm -w d:emprops-api:run local-docker

# Execute workflow
curl -X POST http://localhost:3335/workflows/{workflow-id}/test \
  -H "Content-Type: application/json" \
  -d '{"inputs": {...}}'

# Check Dash0 for spans with:
# - service.name = emprops-api
# - span.name = workflow.execute
# - Parent context should propagate to downstream job.submit spans

3. Job Submit

Status: ✅ Active Span Name: job.submitService: q-job-apiTrigger: POST /api/jobs

Attributes:

  • job.workflow_id: Workflow identifier
  • job.type: Job type (e.g., "comfyui")

Parent: workflow.execute (when called from EmProps API) Children: None currently (should be parent of job.process - see #TODO-Context-Propagation)

Implementation: Manual via telemetryClient.withSpan()Location: apps/api/src/lightweight-api-server.ts:846

Testing:

bash
# Start API
pnpm -w d:api:run local-docker

# Submit job
curl -X POST http://localhost:3331/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"workflow_id": "test-wf-1", "type": "comfyui", ...}'

# Check Dash0 for spans with:
# - service.name = q-job-api
# - span.name = job.submit

4. Job Process

Status: ✅ Active Span Name: job.processService: q-workerTrigger: Worker processes job from Redis queue

Attributes:

  • job.id: Redis job identifier
  • job.service_required: Required service (e.g., "comfyui")

Parent: None currently (should be child of job.submit, grandchild of workflow.execute - see #trace-context-propagation) Children: None currently (should have child spans for connector execution)

Implementation: Manual via telemetryClient.withSpan()Location: apps/worker/src/redis-direct-base-worker.ts:850

Testing:

bash
# Start worker
cd apps/worker && pnpm start

# Worker will process jobs and create spans automatically
# Check Dash0 for spans with:
# - service.name = q-worker
# - span.name = job.process

Missing Traces (Gaps to Fill)

❌ ComfyUI Execution (NEXT PRIORITY)

Target Span Name: comfyui.executeService: q-worker (via comfyui-websocket-connector) Trigger: Worker executes job via ComfyUI connector

Proposed Attributes:

  • comfyui.prompt_id: ComfyUI prompt identifier
  • comfyui.instance: GPU instance (e.g., "comfyui-gpu0")
  • comfyui.host: ComfyUI host URL

Implementation Plan: Add instrumentation to ComfyUIWebSocketConnector


Auto-Instrumentation Status

LibraryStatusNotes
Winston✅ EnabledAutomatic log correlation with traces
HTTP❌ DisabledNo automatic spans for HTTP requests
Express❌ DisabledNo automatic spans for Express routes
Redis❌ DisabledRedis operations not automatically traced

To Enable HTTP/Express Tracing:

typescript
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';

// In packages/telemetry/src/index.ts, add to instrumentations array:
instrumentations.push(
  new HttpInstrumentation(),
  new ExpressInstrumentation()
);

Trace Context Propagation

Current State

Job API → Worker: ❌ No trace context propagation

  • job.submit and job.process are disconnected
  • Need to store W3C traceparent in Redis job metadata

EmProps API → Job API: ❌ No trace context propagation

  • EmProps API doesn't create parent span
  • HTTP requests to Job API don't include trace context headers

TODO: Context Propagation

  1. EmProps API creates parent span for workflow request
  2. Propagate trace context when calling Job API:
    typescript
    const headers = {};
    inject(context.active(), headers, defaultTextMapSetter);
    // Include headers in HTTP request to Job API
  3. Store trace context in Redis job:
    typescript
    job.trace_context = {
      traceparent: headers['traceparent'],
      tracestate: headers['tracestate']
    };
  4. Worker extracts trace context from job:
    typescript
    const parentContext = extract(ROOT_CONTEXT, job.trace_context, defaultTextMapGetter);
    // Create job.process span as child of parentContext

How to Add a New Trace

  1. Choose span name (use semantic naming: operation.action)
  2. Identify parent span (if any)
  3. Add attributes (use OpenTelemetry semantic conventions where possible)
  4. Instrument the code:
    typescript
    await telemetryClient.withSpan('operation.action', async (span) => {
      span.setAttribute('key', 'value');
      // Your operation here
    });
  5. Test the trace in Dash0
  6. Update this document with span details

Validation

To verify trace structure in Dash0:

  1. Navigate to Traces view
  2. Filter by service name
  3. Find trace ID
  4. Verify:
    • [ ] Span appears with correct name
    • [ ] Attributes are present and accurate
    • [ ] Parent-child relationships are correct (if applicable)
    • [ ] Duration is reasonable
    • [ ] Errors are recorded if operation failed

Released under the MIT License.