ADR-002: Pre-Release Testing Strategy for Production Deployments

Date: 2025-10-08 Status: 🤔 Proposed Decision Makers: Engineering Team Approval Required: Before implementing CI/CD test gates Related ADRs: ADR-001: Encrypted Environment Variables, Docker Swarm Migration Analysis

Executive Summary

This ADR proposes a comprehensive pre-release testing strategy to prevent production breakage from untested code. The system will gate all production deployments (Railway.app services) behind an automated test suite running in GitHub Actions.

Current Risk:

✅ 81 commits ahead of master on staging branch
❌ No automated testing before releases
❌ Production deployments triggered directly on git tags
⚠️ Production breakage risk from untested changes

Proposed Solution:

🎯 Test pyramid strategy with unit, integration, and E2E tests
🛡️ CI/CD test gate blocking releases on test failures
⚡ < 35 minute total CI execution time
📊 Comprehensive coverage across critical production paths

Impact:

Before: Manual testing, uncertain production readiness, reactive bug fixes
After: Automated validation, objective deployment criteria, proactive quality assurance

Context
Problem Statement
Decision
Test Architecture
Implementation Strategy
Test Coverage Requirements
CI/CD Integration
Phased Rollout Plan
Success Metrics
Consequences
Alternatives Considered
Open Questions

Context

Current State Analysis

Existing Test Infrastructure:

✅ Vitest configured across all packages
✅ 55+ test files across monorepo
✅ Test scripts in package.json (pnpm test, pnpm test:api, etc.)
✅ Turbo.json configured with test task dependencies
✅ Release workflow exists (.github/workflows/release.yml)

Test Coverage by Package:

Package	Test Files	Coverage Areas
`packages/core/`	Redis functions integration tests	Job matching, capability matching, atomic claiming
`apps/api/`	Integration + E2E tests	Job submission, connector integration, telemetry pipeline
`apps/worker/`	Failure classification, attestation	Error handling, retry logic, failure classification
`apps/webhook-service/`	Telemetry pipeline E2E	Event delivery, retry logic, OTLP integration
`apps/monitor/`	Component tests	Redis connections, machine status, attestation system
`packages/telemetry/`	OTLP integration	Telemetry client, event formatting, Dash0 integration
`apps/emprops-api/`	Unit tests	Linter, pseudorandom, art-gen nodes

Current Release Workflow:

yaml

# .github/workflows/release.yml
on:
  push:
    tags: ["v*"]  # Triggers on git tags
  workflow_dispatch:

jobs:
  build-and-release:
    steps:
      - Checkout code
      - Build worker bundle
      - Create GitHub release
      - Deploy to Railway (production or staging)

Critical Production Services (from release.yml):

Production: q-emprops-api, q-job-api, q-webhook, q-telcollect, openai-machine, openai-response, gemini
Staging: stg.* versions of same services

Current Deployment Model:

Git Tag (v1.2.3)
  ↓
Build Worker Bundle
  ↓
Create GitHub Release
  ↓
Deploy to Railway.app
  ↓
PRODUCTION (no test gate!)

Infrastructure Reality

Distributed Architecture:

API Server: Lightweight Redis orchestration (Railway.app)
Webhook Service: Event delivery, telemetry pipeline (Railway.app)
Telemetry Collector: OTLP bridge, Dash0 integration (Railway.app)
EmProps API: Art generation, job evaluation (Railway.app)
Worker Machines: Ephemeral GPU compute (SALAD, vast.ai, RunPod)

Production Constraints:

Real-time systems: WebSocket-based monitoring, job progress updates
Paying customers: Production breakage affects revenue and reputation
Distributed state: Redis-based coordination, event-driven architecture
External dependencies: OpenAI, Anthropic, ComfyUI, Dash0

Business Impact

Current Risk:

81 commits ahead of master without comprehensive testing
Production deployments lack objective quality criteria
Debugging production issues wastes engineering time
Customer-facing outages damage trust and revenue

Cost of No Testing:

⏰ Time: Hours spent debugging production issues
💰 Revenue: Customer churn from unreliable service
😰 Stress: On-call firefighting instead of feature development
📉 Velocity: Fear of shipping slows innovation

Problem Statement

Requirements

R1: Pre-Release Validation

All production releases must pass automated tests before deployment
Tests must validate critical user flows and system integrity
Test failures must block deployment automatically

R2: Comprehensive Coverage

Unit tests for business logic (< 5 minutes)
Integration tests for component interactions (< 10 minutes)
E2E tests for critical user flows (< 15 minutes)
Build verification for type safety and linting (< 5 minutes)

R3: Fast Feedback

Total CI execution time < 35 minutes (acceptable for release gate)
Parallel test execution where possible
Clear failure reporting with actionable errors

R4: Maintainability

Tests run reliably in CI environment
No flaky tests blocking releases
Clear documentation for adding new tests

R5: Developer Experience

Tests run locally before commit (pre-commit hooks)
Visual test UI available (vitest --ui)
Incremental adoption without blocking current workflow

Non-Requirements

❌ 100% code coverage: Focus on critical paths, not every line
❌ Mutation testing: Overkill for current maturity level
❌ Performance benchmarks: Separate concern, not blocking releases
❌ Cross-browser testing: Backend services, not applicable

Decision

Adopt Test Pyramid Strategy with CI/CD Gates

We will implement a comprehensive pre-release testing strategy based on the test pyramid model, with automated gates in GitHub Actions blocking production deployments on test failures.

Core Principles:

Test Pyramid: Many fast unit tests, fewer integration tests, minimal E2E tests
CI/CD Gates: All tests must pass before deployment proceeds
Incremental Adoption: Phase in coverage over 4 weeks
Fast Feedback: Total CI time < 35 minutes
Actionable Failures: Clear error messages guide debugging

Test Levels:

        /\
       /E2E\         ← Slow (15 min), few tests, high confidence
      /------\          Critical user flows, telemetry pipeline
     /Integration\   ← Medium (10 min), moderate coverage
    /------------\      Redis functions, API endpoints, connectors
   /  Unit Tests  \  ← Fast (5 min), many tests, quick feedback
  /----------------\    Business logic, failure classification, utilities

Implementation Approach:

Option A (RECOMMENDED): Pre-release job in release.yml blocking deployment
Phase 1 (Week 1): Foundation - Unit tests + build verification
Phase 2 (Week 2): Integration tests for Redis and API
Phase 3 (Week 3): E2E tests for critical user flows
Phase 4 (Week 4): Optimization and documentation

Test Architecture

Test Pyramid Breakdown

Level 1: Unit Tests (< 5 minutes total)

Purpose: Fast feedback on business logic correctness

Scope:

Failure classification logic (apps/worker/src/__tests__/failure-classification.test.ts)
Attestation generation (apps/worker/src/__tests__/failure-attestation.test.ts)
Retry count extraction (apps/worker/src/__tests__/retry-count-extraction.test.ts)
EmProps API nodes (apps/emprops-api/src/modules/art-gen/nodes/*.test.ts)
Utility functions (linter, pseudorandom, etc.)

Characteristics:

No external dependencies (mocked Redis, HTTP clients)
Deterministic inputs and outputs
Isolated test cases (no shared state)
Execute in parallel across packages

Example Test:

typescript

// apps/worker/src/__tests__/failure-classification.test.ts
import { describe, it, expect } from 'vitest';
import { FailureClassifier, FailureType } from '../types/failure-classification.js';

describe('Failure Classification System', () => {
  it('should classify HTTP authentication errors correctly', () => {
    const error = 'Request failed with status code 401 - Invalid API key';
    const context = { httpStatus: 401, serviceType: 'openai_responses' };

    const result = FailureClassifier.classify(error, context);

    expect(result.failure_type).toBe(FailureType.AUTH_ERROR);
    expect(result.failure_reason).toBe(FailureReason.INVALID_API_KEY);
  });
});

Turbo Command:

bash

turbo run test --filter='!apps/api' --filter='!apps/webhook-service'

Level 2: Integration Tests (< 10 minutes total)

Purpose: Validate component interactions with real dependencies

Scope:

Redis function integration (packages/core/src/redis-functions/__tests__/integration.test.ts)
API job submission (apps/api/src/__tests__/connector-integration.e2e.test.ts)
Worker job processing (apps/worker/ integration tests)
Webhook delivery flow (apps/webhook-service/__tests__/webhook-server.test.ts)

Characteristics:

Real Redis instance (Docker container or in-memory)
Real HTTP servers (test instances)
Controlled external dependencies (mock external APIs)
Shared test infrastructure (setup/teardown)

Example Test:

typescript

// packages/core/src/redis-functions/__tests__/integration.test.ts
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import Redis from 'ioredis';
import { RedisFunctionInstaller } from '../installer.js';

describe('Redis Function Integration Tests', () => {
  let redis: Redis;

  beforeAll(async () => {
    redis = new Redis(process.env.REDIS_URL || 'redis://localhost:6379');
    const installer = new RedisFunctionInstaller(redis);
    await installer.installOrUpdate();
  });

  it('should match worker with compatible service', async () => {
    // Setup job and worker in Redis
    const jobId = 'test-job-1';
    await redis.hmset(`job:${jobId}`, {
      id: jobId,
      service_required: 'comfyui',
      priority: '100',
      status: 'pending'
    });
    await redis.zadd('jobs:pending', 100, jobId);

    const worker = {
      worker_id: 'worker-1',
      job_service_required_map: ['comfyui', 'a1111']
    };

    // Test Redis function
    const result = await redis.fcall('findMatchingJob', 0, JSON.stringify(worker), '10');

    expect(result).not.toBeNull();
    expect(JSON.parse(result).jobId).toBe(jobId);
  });
});

Turbo Command:

bash

turbo run test:integration

Level 3: E2E Tests (< 15 minutes total)

Purpose: Validate critical user flows end-to-end

Scope:

Job submission → Worker execution → Webhook delivery
Telemetry pipeline: API → Telemetry Collector → Dash0
Machine registration → Job claiming → Progress updates
Workflow execution with multiple steps

Characteristics:

Full system integration (API + Worker + Webhook + Telemetry)
Real external services (mocked APIs via nock)
Real-time event verification (WebSocket, Redis streams)
Longer execution times (10-15 seconds per test)

Example Test:

typescript

// apps/webhook-service/src/__tests__/telemetry-pipeline.e2e.test.ts
import { describe, it, expect } from 'vitest';
import fetch from 'node-fetch';

describe('Telemetry Pipeline E2E', () => {
  it('should deliver job.received event to Dash0', async () => {
    const workflowId = `e2e-test-${Date.now()}`;

    // STEP 1: Submit job to API
    const response = await fetch(`${API_URL}/api/jobs`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        workflow_id: workflowId,
        service_required: 'comfyui',
        payload: { prompt: 'test' }
      })
    });

    expect(response.ok).toBe(true);

    // STEP 2: Wait for telemetry processing
    await new Promise(resolve => setTimeout(resolve, 5000));

    // STEP 3: Verify event in Dash0
    const dash0Response = await fetch('https://api.us-west-2.aws.dash0.com/api/spans', {
      method: 'POST',
      headers: {
        'authorization': `Bearer ${DASH0_AUTH_TOKEN}`,
        'content-type': 'application/json'
      },
      body: JSON.stringify({
        timeRange: { from: startTime, to: endTime },
        dataset: DASH0_DATASET
      })
    });

    const events = await dash0Response.json();
    const jobReceivedEvent = findEventByWorkflowId(events, workflowId);

    expect(jobReceivedEvent).toBeDefined();
    expect(jobReceivedEvent.name).toBe('job.received');
  });
});

Turbo Command:

bash

turbo run test:e2e

Level 4: Build Verification (< 5 minutes total)

Purpose: Ensure code compiles and meets quality standards

Scope:

TypeScript compilation: turbo run typecheck
Lint checks: turbo run lint
Production builds: turbo run build

Characteristics:

No runtime execution
Fast parallel execution
Catches type errors and style violations

Turbo Commands:

bash

turbo run typecheck  # ~2 minutes
turbo run lint       # ~1 minute
turbo run build      # ~2 minutes

Total CI Time Budget

Test Level	Time Budget	Parallelization
Unit Tests	< 5 min	✅ Parallel across packages
Integration Tests	< 10 min	⚠️ Sequential (shared Redis)
E2E Tests	< 15 min	⚠️ Sequential (shared services)
Build Verification	< 5 min	✅ Parallel (typecheck, lint, build)
TOTAL	< 35 min	Mixed strategy

Optimization Strategies:

Cache pnpm dependencies between runs
Cache Turbo build outputs
Run unit tests + build verification in parallel
Run integration + E2E sequentially (shared dependencies)

Implementation Strategy

CI/CD Integration (Option A: Recommended)

Approach: Add pre-release test job to existing release.yml workflow

Workflow Structure:

yaml

# .github/workflows/release.yml
name: Release Worker Bundle

on:
  push:
    tags: ["v*"]
  workflow_dispatch:

jobs:
  # NEW: Pre-release test gate
  test:
    name: Pre-Release Test Suite
    runs-on: ubuntu-latest

    services:
      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'pnpm'

      - name: Enable pnpm
        run: corepack enable

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: Typecheck
        run: pnpm typecheck

      - name: Lint
        run: pnpm lint

      - name: Unit Tests
        run: pnpm test
        env:
          REDIS_URL: redis://localhost:6379
          NODE_ENV: test

      - name: Build
        run: pnpm build

      - name: Upload test results
        if: failure()
        uses: actions/upload-artifact@v3
        with:
          name: test-results
          path: |
            **/coverage/
            **/.turbo/

    env:
      REDIS_URL: redis://localhost:6379
      NODE_ENV: test

  # MODIFIED: Block on test success
  build-and-release:
    needs: [test]  # ← Deployment blocked if tests fail
    runs-on: ubuntu-latest

    steps:
      # ... existing build and release steps
      - name: Build and package worker
        run: |
          # ... existing worker build logic

      - name: Deploy to Railway
        run: |
          # ... existing Railway deployment

Key Changes:

New test job: Runs all pre-release tests
Dependency: build-and-release needs test to succeed
Redis service: Provides test Redis instance
Environment variables: REDIS_URL, NODE_ENV for test execution
Artifact upload: Preserve test results on failure

GitHub Secrets Required

Test Execution:

REDIS_TEST_URL - Test Redis instance (can use service container)
NODE_ENV=test - Environment marker

External API Mocking (Phase 2+):

HF_TOKEN - For model download tests (optional, can mock)
OPENAI_API_KEY - For OpenAI connector tests (optional, can mock)
CIVITAI_TOKEN - For CivitAI model tests (optional, can mock)

Telemetry Testing (Phase 3):

DASH0_AUTH_TOKEN - For E2E telemetry verification
DASH0_DATASET - Test dataset for span verification

Note: Use GitHub Actions secrets management, not repository secrets in code.

Test Infrastructure Setup

Redis Test Container

GitHub Actions Service:

yaml

services:
  redis:
    image: redis:7-alpine
    ports:
      - 6379:6379
    options: >-
      --health-cmd "redis-cli ping"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5

Local Development:

bash

# Use local Redis (already running via dev:local-redis)
pnpm dev:local-redis

# Run tests
pnpm test

Environment Configuration

CI Environment Variables:

yaml

env:
  REDIS_URL: redis://localhost:6379
  NODE_ENV: test
  CI: true

Test Configuration (vitest.config.ts):

typescript

import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    environment: 'node',
    globals: true,
    setupFiles: ['./test/setup.ts'],
    testTimeout: 30000,  // 30 seconds for integration tests
    hookTimeout: 30000,
    pool: 'forks',       // Isolate test processes
    poolOptions: {
      forks: {
        singleFork: false,
        isolate: true
      }
    }
  }
});

Test Coverage Requirements

Priority 0: Block Release (Must Pass)

Redis Job Matching (packages/core)

✅ Atomic job claiming (no race conditions)
✅ Worker capability matching (service compatibility)
✅ Priority ordering (highest priority first)
✅ Customer isolation (strict/loose modes)
✅ Model requirements matching

API Health (apps/api)

✅ Job submission endpoint (/api/jobs POST)
✅ Health check endpoint (/health GET)
✅ WebSocket connection stability

Worker Execution (apps/worker)

✅ Job processing flow (claim → execute → complete)
✅ Failure classification (auth, rate limit, generation refusal)
✅ Attestation generation (workflow-aware)
✅ Retry logic (transient vs permanent failures)

Build Verification

✅ All packages compile (no TypeScript errors)
✅ No critical lint errors
✅ Production builds succeed

Priority 1: Warning (Should Pass)

Webhook Delivery (apps/webhook-service)

⚠️ Event delivery to webhook URLs
⚠️ Retry logic on transient failures
⚠️ Telemetry event formatting

Telemetry Pipeline (apps/telemetry-collector)

⚠️ Redis stream → OTLP conversion
⚠️ Dash0 integration (mocked in CI)
⚠️ Event validation and filtering

Priority 2: Nice to Have (Future)

Environment Management

📋 Environment composition
📋 Service discovery
📋 Docker Compose generation

Machine Management

📋 PM2 service orchestration
📋 ComfyUI custom node installation
📋 Health check reporting

EmProps API

📋 Art generation nodes
📋 Job evaluation logic
📋 Pseudorandom utilities

Coverage Gap Analysis

Current Gaps (55 test files, but missing):

❌ Environment management system (newly documented)
❌ Docker build process
❌ PM2 service management
❌ ComfyUI custom node installation
❌ Machine registration flow
❌ WebSocket event flow (monitor)

Prioritization:

P0 (Block release): Critical production paths only
P1 (Warning): Important but non-blocking features
P2 (Nice to have): Developer experience, tooling

CI/CD Integration

Pre-Release Test Job (Detailed)

Complete GitHub Actions Configuration:

yaml

name: Pre-Release Testing
on:
  push:
    tags: ["v*"]
  workflow_dispatch:

jobs:
  test:
    name: Pre-Release Test Suite
    runs-on: ubuntu-latest
    timeout-minutes: 40  # Safety timeout

    services:
      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'pnpm'

      - name: Enable pnpm
        run: corepack enable

      - name: Get pnpm store directory
        id: pnpm-cache
        shell: bash
        run: |
          echo "STORE_PATH=$(pnpm store path)" >> $GITHUB_OUTPUT

      - name: Setup pnpm cache
        uses: actions/cache@v3
        with:
          path: ${{ steps.pnpm-cache.outputs.STORE_PATH }}
          key: ${{ runner.os }}-pnpm-store-${{ hashFiles('**/pnpm-lock.yaml') }}
          restore-keys: |
            ${{ runner.os }}-pnpm-store-

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: TypeScript Compilation Check
        run: pnpm typecheck

      - name: Lint Check
        run: pnpm lint

      - name: Unit Tests
        run: pnpm test
        env:
          REDIS_URL: redis://localhost:6379
          NODE_ENV: test

      - name: Build Verification
        run: pnpm build
        env:
          NODE_ENV: production

      - name: Upload test results
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: test-results
          path: |
            **/coverage/
            **/.turbo/
          retention-days: 7

      - name: Test Summary
        if: always()
        run: |
          echo "## Test Results" >> $GITHUB_STEP_SUMMARY
          echo "✅ All pre-release tests passed" >> $GITHUB_STEP_SUMMARY
          echo "" >> $GITHUB_STEP_SUMMARY
          echo "### Test Execution" >> $GITHUB_STEP_SUMMARY
          echo "- TypeScript compilation: ✅" >> $GITHUB_STEP_SUMMARY
          echo "- Lint checks: ✅" >> $GITHUB_STEP_SUMMARY
          echo "- Unit tests: ✅" >> $GITHUB_STEP_SUMMARY
          echo "- Build verification: ✅" >> $GITHUB_STEP_SUMMARY

    env:
      REDIS_URL: redis://localhost:6379
      NODE_ENV: test
      CI: true
      TURBO_ENV_MODE: loose  # Allow Turbo to access environment variables

  release:
    needs: [test]  # Blocks deployment if tests fail
    runs-on: ubuntu-latest
    steps:
      # ... existing release steps (unchanged)
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Build and release worker
        run: |
          # ... existing worker build logic

      - name: Deploy to Railway
        run: |
          # ... existing Railway deployment
        env:
          RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }}

Test Execution Flow

Failure Handling

Test Failure Scenarios:

TypeScript Compilation Fails
- Error: Type errors in code
- Action: Block deployment, show compilation errors
- Fix: Address type errors before retrying
Lint Check Fails
- Error: Code style violations
- Action: Block deployment, show lint errors
- Fix: Run pnpm lint --fix or address manually
Unit Tests Fail
- Error: Business logic regression
- Action: Block deployment, upload test results
- Fix: Debug failing tests locally with pnpm test:ui
Build Fails
- Error: Production build errors
- Action: Block deployment, show build logs
- Fix: Address build errors (missing dependencies, etc.)

Notification Strategy:

GitHub Actions summary shows failure details
Slack notification on failure (optional Phase 2)
Email to release tag creator (optional Phase 2)

Phased Rollout Plan

Phase 1: Foundation (Week 1)

Goal: Establish CI infrastructure and basic test coverage

Tasks:

✅ Create pre-release test job in release.yml
✅ Configure Redis test container
✅ Add pnpm caching for faster CI
✅ Run existing unit tests (55 test files)
✅ Run typecheck and lint
✅ Run build verification

Deliverables:

Working CI pipeline with test gate
< 10 minute execution time (unit tests only)
Clear failure reporting

Success Criteria:

CI passes on current staging branch
No flaky tests (3 consecutive successful runs)
Documentation for running tests locally

Rollout:

Deploy to staging first (v1.2.3-staging tag)
Monitor for false positives
Enable for production after 3 successful staging releases

Phase 2: Integration Coverage (Week 2)

Goal: Add integration tests for critical components

Tasks:

✅ Run Redis function integration tests
✅ Run API integration tests (job submission)
✅ Run worker integration tests
✅ Configure test environment secrets
⚠️ Add coverage reporting (optional)

Deliverables:

Integration tests running in CI
< 20 minute total execution time
Coverage reports uploaded

Success Criteria:

Integration tests pass reliably
No Redis connection issues in CI
Clear error messages on failures

Challenges:

Shared Redis instance (sequential tests)
External API mocking (nock configuration)
Test data cleanup between runs

Phase 3: E2E Coverage (Week 3)

Goal: Validate critical user flows end-to-end

Tasks:

✅ Add job submission → worker execution E2E test
✅ Add telemetry pipeline E2E test
⚠️ Add webhook delivery E2E test
⚠️ Configure Dash0 test environment
⚠️ Add machine registration flow test

Deliverables:

E2E tests for critical paths
< 35 minute total execution time
Real-time event verification

Success Criteria:

E2E tests detect real regressions
No false positives from timing issues
Clear test output showing flow progression

Challenges:

Timing-sensitive tests (use explicit waits, not sleeps)
External service mocking (Dash0, OpenAI)
Test isolation (parallel execution conflicts)

Phase 4: Optimization (Week 4)

Goal: Improve CI speed and developer experience

Tasks:

✅ Parallelize independent test suites
✅ Add test result caching
✅ Document test writing guidelines
✅ Add visual test UI documentation
✅ Create troubleshooting guide

Deliverables:

Optimized CI execution (target < 30 minutes)
Comprehensive testing documentation
Developer onboarding guide

Success Criteria:

< 30 minute CI execution time
Developers can add tests without assistance
Zero flaky tests (1 week observation)

Success Metrics

Before Implementation

Current State:

❌ 0% automated test coverage in CI
❌ Manual testing before releases (inconsistent)
❌ Unknown production readiness
⏰ Hours spent debugging production issues
😰 Fear of shipping (slows innovation)

After Implementation

Phase 1 (Foundation):

✅ 100% of releases gated by unit tests
✅ < 10 minute CI feedback time
✅ Typecheck + lint + build verification automated
📊 Baseline metrics established

Phase 2 (Integration):

✅ Redis function coverage (atomic job matching)
✅ API endpoint coverage (job submission)
✅ Worker integration coverage (job processing)
📊 < 20 minute CI execution time

Phase 3 (E2E):

✅ Critical user flow coverage
✅ Telemetry pipeline validation
✅ Webhook delivery verification
📊 < 35 minute CI execution time

Phase 4 (Optimization):

✅ < 30 minute CI execution time
✅ Zero flaky tests (1 week observation)
✅ Developer documentation complete
📊 Production incidents -50% vs baseline

Ongoing Metrics

Test Reliability:

Test pass rate: Target > 95% (excluding legitimate failures)
Flaky test rate: Target < 2% (tests that fail randomly)
CI execution time: Target < 30 minutes

Production Impact:

Incidents caused by releases: Target -50% reduction
Mean time to detect (MTTD): Target < 5 minutes (CI feedback)
Mean time to resolve (MTTR): Target -30% reduction

Developer Experience:

Time to add new test: Target < 30 minutes
Test documentation completeness: Target 100%
Developer satisfaction: Survey after 1 month

Dashboard Metrics (future):

Test execution trends (speed over time)
Coverage trends (% of code covered)
Failure trends (which tests fail most)

Consequences

Positive

1. Production Safety

✅ No untested code reaches production
✅ Objective pass/fail criteria for releases
✅ Early detection of regressions
✅ Reduced customer-facing incidents

2. Developer Confidence

✅ Safe refactoring with test coverage
✅ Fast feedback on changes (local + CI)
✅ Clear error messages guide debugging
✅ Less fear of breaking production

3. Team Velocity

✅ Automated testing faster than manual
✅ Parallel development without conflicts
✅ Onboarding documentation (tests as examples)
✅ Less time firefighting production issues

4. Engineering Culture

✅ Quality-first mindset
✅ Documentation as code (tests document behavior)
✅ Continuous improvement (add tests for bugs)
✅ Reduced technical debt

5. Business Impact

✅ Customer trust from reliable service
✅ Revenue protection (fewer outages)
✅ Competitive advantage (ship faster safely)
✅ Engineering reputation

Negative

1. Initial Investment

❌ 1-2 weeks to implement (40-80 hours)
❌ Learning curve for test writing
❌ CI setup complexity
Mitigation: Phased rollout, pair programming, documentation

2. Ongoing Maintenance

❌ Tests need updating with code changes
❌ Flaky tests can block releases
❌ CI costs (GitHub Actions minutes)
Mitigation: Fix flaky tests immediately, optimize CI, monitor costs

3. False Negatives

❌ Flaky tests block valid releases
❌ Environment differences (CI vs production)
❌ Timing-sensitive tests fail randomly
Mitigation: Retry logic, explicit waits, test isolation

4. Developer Workflow

❌ Slower releases (35 min CI vs immediate)
❌ Test failures require debugging
❌ Pre-commit hooks slow local workflow
Mitigation: Fast local tests, clear errors, optional pre-commit

5. Coverage Gaps

❌ Cannot test every scenario
❌ Integration tests miss production differences
❌ E2E tests miss edge cases
Mitigation: Focus on critical paths, production monitoring, iterative improvement

Risk Mitigation Strategies

Flaky Tests:

Problem: Random failures block releases
Solution:
- Explicit waits instead of sleep()
- Test isolation (no shared state)
- Retry logic for network-dependent tests
- Quarantine flaky tests until fixed

CI Costs:

Problem: GitHub Actions minutes usage
Solution:
- Cache dependencies aggressively
- Parallelize independent tests
- Monitor usage with cost alerts
- Self-hosted runners if cost prohibitive

Developer Frustration:

Problem: Tests seen as blocker
Solution:
- Fast local test execution
- Clear error messages
- Visual test UI (vitest --ui)
- Pair programming for test writing

Alternatives Considered

Alternative 1: No Automated Testing ❌

Approach: Continue manual testing before releases

Pros:

No initial investment
No CI setup complexity
No flaky test issues

Cons:

❌ High risk: 81 commits untested
❌ Slow: Manual testing takes hours
❌ Incomplete: Cannot test all scenarios
❌ Unreliable: Human error, inconsistent coverage
❌ Not scalable: Slows as codebase grows

Verdict: REJECTED - Too risky with significant changes pending

Alternative 2: Post-Deployment Testing Only ⚠️

Approach: Deploy to staging, run tests, promote to production

Pros:

Tests run in production-like environment
Catches environment-specific issues
No CI setup required

Cons:

⚠️ Customers affected: Staging breakage delays production
⚠️ Slower: Deploy → test → rollback → fix cycle
⚠️ Manual: Requires human intervention
⚠️ Partial: Doesn't catch pre-deployment issues

Verdict: REJECTED - Useful as smoke tests but insufficient alone

Alternative 3: Staging Environment Testing Only ⚠️

Approach: Require staging deployment before production

Pros:

Real environment testing
Catches configuration issues
Minimal CI setup

Cons:

⚠️ Drift risk: Staging ≠ production (different configs)
⚠️ Manual: Human validation required
⚠️ Slow: Deploy → test → deploy cycle
⚠️ Incomplete: Doesn't test all code paths

Verdict: PARTIAL - Good practice but not sufficient, use alongside automated tests

Alternative 4: Feature Flags + Gradual Rollout ✅

Approach: Use feature flags to control new features, gradual rollout

Pros:

✅ Control blast radius (limit affected users)
✅ A/B testing capability
✅ Quick rollback (disable flag)
✅ Production testing with real traffic

Cons:

⚠️ Complexity: Flag management overhead
⚠️ Technical debt: Old flags linger
⚠️ Not comprehensive: Doesn't replace testing

Verdict: COMPLEMENTARY - Use alongside testing, not instead of

Alternative 5: Contract Testing (Consumer-Driven) 🤔

Approach: Define contracts between services, test independently

Pros:

Decoupled service testing
Parallel development
Clear interface definitions

Cons:

⚠️ Complexity: Contract management overhead
⚠️ Tooling: Pact.js setup required
⚠️ Incomplete: Doesn't test full integration

Verdict: FUTURE - Good for microservices, overkill for current maturity

Alternative 6: Mutation Testing 🤔

Approach: Modify code (mutants) to verify tests catch bugs

Pros:

Verifies test quality
Catches weak assertions
Improves coverage

Cons:

⚠️ Slow: 10x longer execution time
⚠️ Overkill: Current maturity doesn't justify
⚠️ Diminishing returns: High cost for marginal benefit

Verdict: FUTURE - Consider after baseline coverage established

Decision Matrix

Alternative	Safety	Speed	Cost	Complexity	Verdict
Test Pyramid (Chosen)	✅ High	✅ Fast	⚠️ Medium	⚠️ Medium	✅ RECOMMENDED
No Testing	❌ Low	✅ Fast	✅ Low	✅ Low	❌ Rejected
Post-Deployment	⚠️ Medium	❌ Slow	✅ Low	✅ Low	❌ Rejected
Staging Only	⚠️ Medium	❌ Slow	✅ Low	✅ Low	⚠️ Partial
Feature Flags	✅ High	✅ Fast	⚠️ Medium	❌ High	✅ Complementary
Contract Testing	✅ High	✅ Fast	⚠️ Medium	❌ High	🤔 Future
Mutation Testing	✅ High	❌ Slow	❌ High	❌ High	🤔 Future

Open Questions

Q1: Test Coverage Thresholds

Question: Should we enforce minimum code coverage percentages (e.g., 80%)?

Considerations:

Pros: Objective quality metric, forces coverage
Cons: Encourages low-quality tests, diminishing returns
Alternative: Focus on critical path coverage, not percentage

Recommendation: No coverage thresholds initially. Focus on quality over quantity. Revisit after baseline coverage established.

Q2: PR Testing vs Release Testing

Question: Should tests run on every PR to master, or only on releases?

Options:

Option A: PR Testing Only

Tests run on pull_request to master
Feedback before merge
No release-time testing

Option B: Release Testing Only

Tests run on git tags
Fast PR merges
Risk of broken master

Option C: Both PR + Release (Recommended)

Tests on PR (fast subset)
Full tests on release (comprehensive)
Best safety, some duplication

Recommendation: Option C - Run fast tests on PR (unit + typecheck), full suite on releases.

Q3: External API Handling

Question: How do we test integrations with external APIs (OpenAI, Anthropic, Dash0)?

Options:

Option A: Mock All External APIs

Use nock to intercept HTTP calls
Fast, deterministic tests
Risk: Mocks drift from reality

Option B: Test Against Real APIs

Use test accounts/keys
Real integration validation
Risk: Slow, flaky, costs money

Option C: Hybrid (Recommended)

Mock in CI (fast, deterministic)
Real API testing in staging
Best of both worlds

Recommendation: Option C - Mock external APIs in CI with nock, run real API tests in staging environment.

Q4: Test Retry Logic

Question: Should we automatically retry flaky tests in CI?

Considerations:

Pros: Reduces false negatives from timing issues
Cons: Masks real problems, slower CI
Alternative: Fix flaky tests immediately

Recommendation: Limited retries - Retry network-dependent tests (max 2 attempts), no retries for unit tests. Track retry rate as metric.

Q5: Parallel Test Execution

Question: Should we invest in parallel test execution now or later?

Current State:

Sequential execution: ~35 minutes
Parallel potential: ~20 minutes (estimate)
Cost: 1-2 days implementation

Recommendation: Later - Optimize during Phase 4 if CI time exceeds 35 minutes. Focus on coverage first, speed second.

Q6: Self-Hosted Runners

Question: Should we use GitHub self-hosted runners instead of GitHub-hosted?

Considerations:

GitHub-Hosted (Current):

✅ Zero maintenance
✅ Always available
❌ Slower startup
❌ Monthly cost

Self-Hosted:

✅ Faster execution
✅ Lower cost at scale
❌ Maintenance overhead
❌ Security concerns

Recommendation: GitHub-hosted initially - Monitor costs, switch to self-hosted if costs exceed $100/month or CI time exceeds 40 minutes.

ADRs:

ADR-001: Encrypted Environment Variables - Environment management for tests
Docker Swarm Migration Analysis - Testing complexity analysis

Guides:

Testing Procedures - Standard testing commands and procedures
Environment Management Guide - Component-based configuration
CLAUDE.md - Development workflow and QA agent guidelines

Infrastructure:

.github/workflows/release.yml - Current release workflow
turbo.json - Turbo build configuration
package.json - Test scripts and dependencies

Implementation Checklist

Phase 1: Foundation (Week 1)

[ ] Create test job in .github/workflows/release.yml
[ ] Configure Redis service container
[ ] Add pnpm caching
[ ] Run existing unit tests (turbo run test)
[ ] Add typecheck step
[ ] Add lint step
[ ] Add build verification step
[ ] Test on staging branch (v1.2.3-staging)
[ ] Document CI setup in README
[ ] Update ADR index with this ADR

Phase 2: Integration (Week 2)

[ ] Add Redis function integration tests to CI
[ ] Add API integration tests to CI
[ ] Configure test environment secrets
[ ] Add coverage reporting (optional)
[ ] Document integration test patterns
[ ] Fix any flaky integration tests

Phase 3: E2E (Week 3)

[ ] Add job submission E2E test
[ ] Add telemetry pipeline E2E test
[ ] Add webhook delivery E2E test
[ ] Configure Dash0 test environment
[ ] Document E2E test patterns
[ ] Fix timing-sensitive test issues

Phase 4: Optimization (Week 4)

[ ] Parallelize independent test suites
[ ] Add test result caching
[ ] Document test writing guidelines
[ ] Create troubleshooting guide
[ ] Monitor and optimize CI time
[ ] Collect baseline metrics

Post-Implementation

[ ] Review success metrics after 1 month
[ ] Survey developer satisfaction
[ ] Identify coverage gaps
[ ] Plan next iteration improvements

Approval and Next Steps

Approval Required From:

[ ] Engineering Team Lead
[ ] DevOps/Infrastructure Lead
[ ] QA Lead (if applicable)

Next Steps After Approval:

Create GitHub issue tracking implementation
Assign Phase 1 tasks to engineer(s)
Schedule kickoff meeting
Begin Phase 1 implementation
Review and iterate based on feedback

Questions or Feedback: Contact Architecture Team or post in #architecture Slack channel.

Document Version: 1.0 Last Updated: 2025-10-08 Author: Claude Code (AI Agent) Reviewers: (to be added after team review)

ADR-002: Pre-Release Testing Strategy for Production Deployments ​

Executive Summary ​

Table of Contents ​

Context ​

Current State Analysis ​

Infrastructure Reality ​

Business Impact ​

Problem Statement ​

Requirements ​

Non-Requirements ​

Decision ​

Adopt Test Pyramid Strategy with CI/CD Gates ​

Test Architecture ​

Test Pyramid Breakdown ​

Level 1: Unit Tests (< 5 minutes total) ​

Level 2: Integration Tests (< 10 minutes total) ​

Level 3: E2E Tests (< 15 minutes total) ​

Level 4: Build Verification (< 5 minutes total) ​

Total CI Time Budget ​

Implementation Strategy ​

CI/CD Integration (Option A: Recommended) ​

GitHub Secrets Required ​

Test Infrastructure Setup ​

Redis Test Container ​

Environment Configuration ​

Test Coverage Requirements ​

Priority 0: Block Release (Must Pass) ​

Priority 1: Warning (Should Pass) ​

Priority 2: Nice to Have (Future) ​

Coverage Gap Analysis ​

CI/CD Integration ​

Pre-Release Test Job (Detailed) ​

Test Execution Flow ​

Failure Handling ​

Phased Rollout Plan ​

Phase 1: Foundation (Week 1) ​

Phase 2: Integration Coverage (Week 2) ​

Phase 3: E2E Coverage (Week 3) ​

Phase 4: Optimization (Week 4) ​

Success Metrics ​

Before Implementation ​

After Implementation ​

Ongoing Metrics ​

Consequences ​

Positive ​

Negative ​

Risk Mitigation Strategies ​

Alternatives Considered ​

Alternative 1: No Automated Testing ❌ ​

Alternative 2: Post-Deployment Testing Only ⚠️ ​

Alternative 3: Staging Environment Testing Only ⚠️ ​

Alternative 4: Feature Flags + Gradual Rollout ✅ ​

Alternative 5: Contract Testing (Consumer-Driven) 🤔 ​

Alternative 6: Mutation Testing 🤔 ​

Decision Matrix ​

Open Questions ​

Q1: Test Coverage Thresholds ​

Q2: PR Testing vs Release Testing ​

Q3: External API Handling ​

Q4: Test Retry Logic ​

Q5: Parallel Test Execution ​

Q6: Self-Hosted Runners ​

Related Documentation ​

Implementation Checklist ​

Phase 1: Foundation (Week 1) ​

Phase 2: Integration (Week 2) ​

Phase 3: E2E (Week 3) ​

Phase 4: Optimization (Week 4) ​

Post-Implementation ​

Approval and Next Steps ​

ADR-002: Pre-Release Testing Strategy for Production Deployments

Executive Summary

Table of Contents

Context

Current State Analysis

Infrastructure Reality

Business Impact

Problem Statement

Requirements

Non-Requirements

Decision

Adopt Test Pyramid Strategy with CI/CD Gates

Test Architecture

Test Pyramid Breakdown

Level 1: Unit Tests (< 5 minutes total)

Level 2: Integration Tests (< 10 minutes total)

Level 3: E2E Tests (< 15 minutes total)

Level 4: Build Verification (< 5 minutes total)

Total CI Time Budget

Implementation Strategy

CI/CD Integration (Option A: Recommended)

GitHub Secrets Required

Test Infrastructure Setup

Redis Test Container

Environment Configuration

Test Coverage Requirements

Priority 0: Block Release (Must Pass)

Priority 1: Warning (Should Pass)

Priority 2: Nice to Have (Future)

Coverage Gap Analysis

CI/CD Integration

Pre-Release Test Job (Detailed)

Test Execution Flow

Failure Handling

Phased Rollout Plan

Phase 1: Foundation (Week 1)

Phase 2: Integration Coverage (Week 2)

Phase 3: E2E Coverage (Week 3)

Phase 4: Optimization (Week 4)

Success Metrics

Before Implementation

After Implementation

Ongoing Metrics

Consequences

Positive

Negative

Risk Mitigation Strategies

Alternatives Considered

Alternative 1: No Automated Testing ❌

Alternative 2: Post-Deployment Testing Only ⚠️

Alternative 3: Staging Environment Testing Only ⚠️

Alternative 4: Feature Flags + Gradual Rollout ✅

Alternative 5: Contract Testing (Consumer-Driven) 🤔

Alternative 6: Mutation Testing 🤔

Decision Matrix

Open Questions

Q1: Test Coverage Thresholds

Q2: PR Testing vs Release Testing

Q3: External API Handling

Q4: Test Retry Logic

Q5: Parallel Test Execution

Q6: Self-Hosted Runners

Related Documentation

Implementation Checklist

Phase 1: Foundation (Week 1)

Phase 2: Integration (Week 2)

Phase 3: E2E (Week 3)

Phase 4: Optimization (Week 4)

Post-Implementation

Approval and Next Steps