Error Handling
Comprehensive guide to error handling in the EmProps Job Queue system.
Overview
The error handling system is designed with a key principle: TypeScript is the source of truth for error classification. This allows us to:
- Minimize changes to upstream dependencies (ComfyUI fork)
- Centralize error logic in one language/ecosystem
- Maintain error patterns without rebuilding Docker containers
- Provide consistent error handling across all services (ComfyUI, OpenAI, Gemini, etc.)
Core Architecture
Key Principles
1. TypeScript-Only Classification
All error pattern matching happens in TypeScript, not in Python. This is enforced by the Error Handling Modernization ADR.
Why?
- ComfyUI is an upstream fork - the more we modify it, the harder it is to merge updates
- ComfyUI builds are slow (64 custom nodes)
- TypeScript is our codebase - one place to maintain error logic
2. Critical vs Non-Critical
Errors are filtered at two levels:
- Log Level: Only
criticalanderrorlevel logs can fail jobs - Content: Infrastructure/telemetry errors don't fail jobs even if logged as errors
3. ONE PLACE for Each Type
When you encounter an error issue, there's always ONE PLACE to go:
| Error Type | Location |
|---|---|
| ComfyUI workflow errors | packages/core/src/errors/comfyui-error-enhancer.ts |
| Infrastructure/telemetry errors | apps/worker/src/connectors/comfyui-rest-stream-connector.ts → isInfrastructureError() |
| Generic service errors | packages/core/src/types/failure-classification.ts → FailureClassifier |
Common Tasks
Adding a ComfyUI Error Pattern
When you see an unclassified ComfyUI error in production:
# 1. Find the error pattern in logs
KeyError: 'url' from emprops_image_loader.py:66
# 2. Open THE ONE PLACE
code packages/core/src/errors/comfyui-error-enhancer.ts
# 3. Add pattern in PATTERN MATCHERS section
# 4. Build core package
pnpm --filter=@emp/core buildHandling Non-Critical Errors
When an infrastructure error incorrectly fails a job:
# Example: "StatusCode.UNAVAILABLE while exporting to dash0.com"
# 1. Open THE ONE PLACE
code apps/worker/src/connectors/comfyui-rest-stream-connector.ts
# 2. Add pattern to isInfrastructureError() method
# 3. Build worker package
pnpm --filter=@emp/worker build→ Full Guide: Non-Critical Errors
Error Flow
1. Exception Occurs in ComfyUI
# packages/comfyui/execution.py
try:
result = execute_node(node)
except Exception as ex:
# Send RAW exception data only (no classification)
error_details = {
"node_id": real_node_id,
"node_type": class_type,
"exception_message": str(ex),
"exception_type": exception_type,
"traceback": traceback.format_tb(tb),
"current_inputs": input_data_formatted,
}
# Published to Redis Stream2. Worker Receives Exception
// apps/worker/src/connectors/comfyui-rest-stream-connector.ts
const logResult = await this.readLogStream(eventsStreamKey, lastEventId, blockTimeout, jobData);
// Filter 1: Is it critical level? (error/critical)
// Filter 2: Is it infrastructure error? (telemetry/metrics)
// If both pass → proceed to classification3. TypeScript Classification
// packages/core/src/errors/comfyui-error-enhancer.ts
const connectorError = ComfyUIErrorEnhancer.enhance(error, {
workflow: jobData.id,
component: componentName,
jobId: jobData.id,
});
// Returns: ConnectorError with:
// - FailureType (high-level: VALIDATION_ERROR, RESOURCE_LIMIT, etc.)
// - FailureReason (specific: MISSING_REQUIRED_FIELD, GPU_MEMORY_FULL, etc.)
// - User-friendly message and suggestion4. UI Display
// apps/monitor/src/components/JobDetailsModal.tsx
// Displays user-friendly error message with suggestionError Types
Critical Errors (Fail Job)
These errors prevent the AI workflow from completing:
- Validation Errors: Missing required fields, invalid parameters, type mismatches
- Resource Limits: GPU OOM, CPU OOM, disk space exhausted
- Model Errors: Model not found, checkpoint missing
- Custom Node Errors: Node not installed, node execution failure
- Workflow Errors: Invalid workflow structure, circular dependencies
Non-Critical Errors (Log Only)
These errors are logged but don't fail the job:
- Telemetry Exports: Dash0, Jaeger, OTLP export failures
- Metrics Collection: Prometheus, StatsD connection issues
- Health Checks: Non-essential health check failures
- Log Shipping: Log exporter connection issues
Testing
Test ComfyUIErrorEnhancer
# Create test file
cat > /tmp/test-error.mjs << 'EOF'
import { ComfyUIErrorEnhancer } from '/path/to/packages/core/dist/errors/comfyui-error-enhancer.js';
const error = {
exception_message: "KeyError: 'url'",
exception_type: 'KeyError',
node_id: '66',
node_type: 'emprops_image_loader'
};
const result = ComfyUIErrorEnhancer.enhance(error);
console.log('Type:', result.failureType);
console.log('Reason:', result.failureReason);
console.log('Message:', result.message);
console.log('Suggestion:', result.context.suggestion);
EOF
node /tmp/test-error.mjsTest Infrastructure Error Filtering
const connector = new ComfyUIRestStreamConnector(/* config */);
// Should return true (non-critical)
connector.isInfrastructureError("StatusCode.UNAVAILABLE while exporting to dash0.com");
// Should return false (critical)
connector.isInfrastructureError("CUDA out of memory");Debugging
Check Error Classification
# Watch monitor UI for error display
open http://localhost:3001
# Check worker logs
docker logs <worker-container> -f | grep ERROR
# Check ComfyUI logs
docker exec <machine-container> pm2 logs comfyui-gpu0Verify Error Patterns
# Search for error in pattern matchers
grep -r "your error pattern" packages/core/src/errors/
grep -r "your error pattern" apps/worker/src/connectors/Best Practices
DO
✅ Add error patterns to TypeScript files only ✅ Use descriptive error messages and suggestions ✅ Test error patterns before deploying ✅ Document new patterns with code comments ✅ Use FailureType and FailureReason enums consistently
DON'T
❌ Don't add error classification to Python (ComfyUI) ❌ Don't create new error enums in Python ❌ Don't fail jobs on infrastructure errors ❌ Don't hide errors with fallbacks (fail with clear messages) ❌ Don't skip pattern testing
Related Documentation
- ComfyUI Error Patterns Guide
- Non-Critical Error Handling
- Error Handling Architecture
- ADR: Error Handling Modernization
- HOW_TO_ADD_ERROR_PATTERNS.md
- HOW_TO_HANDLE_NON_CRITICAL_ERRORS.md
Quick Reference
// Add ComfyUI error pattern
// Location: packages/core/src/errors/comfyui-error-enhancer.ts
if (message.includes('your pattern')) {
return new ConnectorError(
FailureType.VALIDATION_ERROR,
FailureReason.MISSING_REQUIRED_FIELD,
'User-friendly message',
false, // retryable
{ suggestion: 'How to fix it' }
);
}
// Add infrastructure error pattern
// Location: apps/worker/src/connectors/comfyui-rest-stream-connector.ts
private isInfrastructureError(message: string): boolean {
const messageLower = message.toLowerCase();
if (messageLower.includes('your pattern')) {
return true; // Don't fail job
}
return false; // Critical - fail job
}