Skip to content

Error Handling

Comprehensive guide to error handling in the EmProps Job Queue system.

Overview

The error handling system is designed with a key principle: TypeScript is the source of truth for error classification. This allows us to:

  • Minimize changes to upstream dependencies (ComfyUI fork)
  • Centralize error logic in one language/ecosystem
  • Maintain error patterns without rebuilding Docker containers
  • Provide consistent error handling across all services (ComfyUI, OpenAI, Gemini, etc.)

Core Architecture

Key Principles

1. TypeScript-Only Classification

All error pattern matching happens in TypeScript, not in Python. This is enforced by the Error Handling Modernization ADR.

Why?

  • ComfyUI is an upstream fork - the more we modify it, the harder it is to merge updates
  • ComfyUI builds are slow (64 custom nodes)
  • TypeScript is our codebase - one place to maintain error logic

2. Critical vs Non-Critical

Errors are filtered at two levels:

  1. Log Level: Only critical and error level logs can fail jobs
  2. Content: Infrastructure/telemetry errors don't fail jobs even if logged as errors

3. ONE PLACE for Each Type

When you encounter an error issue, there's always ONE PLACE to go:

Error TypeLocation
ComfyUI workflow errorspackages/core/src/errors/comfyui-error-enhancer.ts
Infrastructure/telemetry errorsapps/worker/src/connectors/comfyui-rest-stream-connector.tsisInfrastructureError()
Generic service errorspackages/core/src/types/failure-classification.tsFailureClassifier

Common Tasks

Adding a ComfyUI Error Pattern

When you see an unclassified ComfyUI error in production:

bash
# 1. Find the error pattern in logs
KeyError: 'url' from emprops_image_loader.py:66

# 2. Open THE ONE PLACE
code packages/core/src/errors/comfyui-error-enhancer.ts

# 3. Add pattern in PATTERN MATCHERS section
# 4. Build core package
pnpm --filter=@emp/core build

→ Full Guide: ComfyUI Errors

Handling Non-Critical Errors

When an infrastructure error incorrectly fails a job:

bash
# Example: "StatusCode.UNAVAILABLE while exporting to dash0.com"

# 1. Open THE ONE PLACE
code apps/worker/src/connectors/comfyui-rest-stream-connector.ts

# 2. Add pattern to isInfrastructureError() method
# 3. Build worker package
pnpm --filter=@emp/worker build

→ Full Guide: Non-Critical Errors

Error Flow

1. Exception Occurs in ComfyUI

python
# packages/comfyui/execution.py
try:
    result = execute_node(node)
except Exception as ex:
    # Send RAW exception data only (no classification)
    error_details = {
        "node_id": real_node_id,
        "node_type": class_type,
        "exception_message": str(ex),
        "exception_type": exception_type,
        "traceback": traceback.format_tb(tb),
        "current_inputs": input_data_formatted,
    }
    # Published to Redis Stream

2. Worker Receives Exception

typescript
// apps/worker/src/connectors/comfyui-rest-stream-connector.ts
const logResult = await this.readLogStream(eventsStreamKey, lastEventId, blockTimeout, jobData);

// Filter 1: Is it critical level? (error/critical)
// Filter 2: Is it infrastructure error? (telemetry/metrics)
// If both pass → proceed to classification

3. TypeScript Classification

typescript
// packages/core/src/errors/comfyui-error-enhancer.ts
const connectorError = ComfyUIErrorEnhancer.enhance(error, {
  workflow: jobData.id,
  component: componentName,
  jobId: jobData.id,
});

// Returns: ConnectorError with:
// - FailureType (high-level: VALIDATION_ERROR, RESOURCE_LIMIT, etc.)
// - FailureReason (specific: MISSING_REQUIRED_FIELD, GPU_MEMORY_FULL, etc.)
// - User-friendly message and suggestion

4. UI Display

typescript
// apps/monitor/src/components/JobDetailsModal.tsx
// Displays user-friendly error message with suggestion

Error Types

Critical Errors (Fail Job)

These errors prevent the AI workflow from completing:

  • Validation Errors: Missing required fields, invalid parameters, type mismatches
  • Resource Limits: GPU OOM, CPU OOM, disk space exhausted
  • Model Errors: Model not found, checkpoint missing
  • Custom Node Errors: Node not installed, node execution failure
  • Workflow Errors: Invalid workflow structure, circular dependencies

Non-Critical Errors (Log Only)

These errors are logged but don't fail the job:

  • Telemetry Exports: Dash0, Jaeger, OTLP export failures
  • Metrics Collection: Prometheus, StatsD connection issues
  • Health Checks: Non-essential health check failures
  • Log Shipping: Log exporter connection issues

Testing

Test ComfyUIErrorEnhancer

bash
# Create test file
cat > /tmp/test-error.mjs << 'EOF'
import { ComfyUIErrorEnhancer } from '/path/to/packages/core/dist/errors/comfyui-error-enhancer.js';

const error = {
  exception_message: "KeyError: 'url'",
  exception_type: 'KeyError',
  node_id: '66',
  node_type: 'emprops_image_loader'
};

const result = ComfyUIErrorEnhancer.enhance(error);
console.log('Type:', result.failureType);
console.log('Reason:', result.failureReason);
console.log('Message:', result.message);
console.log('Suggestion:', result.context.suggestion);
EOF

node /tmp/test-error.mjs

Test Infrastructure Error Filtering

typescript
const connector = new ComfyUIRestStreamConnector(/* config */);

// Should return true (non-critical)
connector.isInfrastructureError("StatusCode.UNAVAILABLE while exporting to dash0.com");

// Should return false (critical)
connector.isInfrastructureError("CUDA out of memory");

Debugging

Check Error Classification

bash
# Watch monitor UI for error display
open http://localhost:3001

# Check worker logs
docker logs <worker-container> -f | grep ERROR

# Check ComfyUI logs
docker exec <machine-container> pm2 logs comfyui-gpu0

Verify Error Patterns

bash
# Search for error in pattern matchers
grep -r "your error pattern" packages/core/src/errors/
grep -r "your error pattern" apps/worker/src/connectors/

Best Practices

DO

✅ Add error patterns to TypeScript files only ✅ Use descriptive error messages and suggestions ✅ Test error patterns before deploying ✅ Document new patterns with code comments ✅ Use FailureType and FailureReason enums consistently

DON'T

❌ Don't add error classification to Python (ComfyUI) ❌ Don't create new error enums in Python ❌ Don't fail jobs on infrastructure errors ❌ Don't hide errors with fallbacks (fail with clear messages) ❌ Don't skip pattern testing

Quick Reference

typescript
// Add ComfyUI error pattern
// Location: packages/core/src/errors/comfyui-error-enhancer.ts
if (message.includes('your pattern')) {
  return new ConnectorError(
    FailureType.VALIDATION_ERROR,
    FailureReason.MISSING_REQUIRED_FIELD,
    'User-friendly message',
    false, // retryable
    { suggestion: 'How to fix it' }
  );
}

// Add infrastructure error pattern
// Location: apps/worker/src/connectors/comfyui-rest-stream-connector.ts
private isInfrastructureError(message: string): boolean {
  const messageLower = message.toLowerCase();
  if (messageLower.includes('your pattern')) {
    return true; // Don't fail job
  }
  return false; // Critical - fail job
}

Released under the MIT License.