Skip to content

Environment Management System

Overview

The emp-job-queue system uses a sophisticated component-based environment management system built on the @emp/env-management package. This system provides type-safe, hierarchical environment configuration across development, staging, and production environments while supporting multiple deployment targets.

Key Design Goals:

  • Separation of Concerns: Component files define capabilities, service interfaces define requirements
  • Type Safety: Service interfaces enforce required variables at build time
  • Flexibility: Profile-based composition allows mixing components for different scenarios
  • Security: Automatic separation of public and secret variables
  • Multi-environment: Single codebase supports local dev, staging, and production

Architecture Overview

System Components

1. Component Files (config/environments/components/)

Component files define what configuration is available for each component (API, Redis, Machine, Worker, etc.). They use INI format with environment-specific sections and namespace prefixing.

Structure:

ini
NAMESPACE=COMPONENT_NAME

[default]
# Values available in all environments
COMMON_VAR=value

[local]
# Local development overrides
LOCAL_VAR=value

[staging]
# Staging environment
STAGING_VAR=value

[production]
# Production environment
PROD_VAR=value

Example: redis.env

ini
NAMESPACE=REDIS

[default]
DB=0
MAX_CONNECTIONS=200

[local]
INGRESS=:6379
URL=redis://host.docker.internal:6379

[staging]
INGRESS=redis://default:${STAGING_REDIS_PASSWORD}@switchback.proxy.rlwy.net:48889
URL=redis://default:${STAGING_REDIS_PASSWORD}@switchback.proxy.rlwy.net:48889

[production]
INGRESS=redis://default:${PRODUCTION_REDIS_PASSWORD}@ballast.proxy.rlwy.net:30645
URL=redis://default:${PRODUCTION_REDIS_PASSWORD}@ballast.proxy.rlwy.net:30645

Key Features:

  • Namespace Prefixing: NAMESPACE=REDIS → all variables prefixed as REDIS_*
  • Environment Sections: [local], [staging], [production] for environment-specific values
  • Variable Substitution: ${PRODUCTION_REDIS_PASSWORD} resolves from secrets
  • Layering: Later sections override earlier ones

Current Components:

  • api.env - Job queue API configuration
  • redis.env - Redis connection and settings
  • database.env - PostgreSQL configuration
  • machine.env - Machine/container orchestration
  • worker.env - Worker process configuration
  • monitor.env - Monitoring UI settings
  • comfyui.env - ComfyUI integration
  • ollama.env - Ollama LLM service
  • storage-provider.env - Cloud storage (S3/Azure/GCP)
  • telemetry.env - Observability configuration
  • emprops.env - EmProps platform API
  • Plus: api-tokens.env, openai.env, gemini.env, ngrok.env, simulation.env, webhook-service.env, telemetry-collector.env

2. Service Interfaces (config/environments/services/)

Service interfaces define what configuration is required for each service to run. They enforce type-safe environment validation and map component variables to service-specific names.

Structure:

typescript
export const ServiceNameEnvInterface = {
  name: "service-name",
  location: "apps/service-name",  // Where .env files are generated

  required: {
    "APP_VAR_NAME": "COMPONENT_VAR_NAME",  // Maps app var → component var
  },

  secret: {
    "SECRET_VAR": "COMPONENT_SECRET_VAR",  // Separated into .env.secret
  },

  optional: {
    "OPTIONAL_VAR": "COMPONENT_OPTIONAL_VAR",
  },

  defaults: {
    "VAR_WITH_DEFAULT": "default_value",
  }
};

Example: machine.interface.ts

typescript
export const MachineEnvInterface = {
  name: "machine",
  location: "apps/machine",

  required: {
    "HUB_REDIS_URL": "REDIS_URL",
    "EMPROPS_API_URL": "EMPROPS_JOB_QUEUE_API_URL",
    "GPU_MODE": "MACHINE_GPU_MODE",

    // Telemetry
    "OTEL_COLLECTOR_ENDPOINT": "${MACHINE_EGRESS_GRPC}${TELEMETRY_COLLECTOR_OTEL_COLLECTOR_ENDPOINT}",
    "TELEMETRY_ENV": "TELEMETRY_DASH0_DATASET",

    // Storage
    "CLOUD_PROVIDER": "STORAGE_PROVIDER_CURRENT_SERVICE",
    "CLOUD_STORAGE_CONTAINER": "STORAGE_PROVIDER_CONTAINER",

    // Ollama configuration (if using ollama workers)
    "OLLAMA_HOST": "OLLAMA_HOST",
    "OLLAMA_PORT": "OLLAMA_PORT",
    "OLLAMA_DEFAULT_MODELS": "OLLAMA_DEFAULT_MODELS",
  },

  secret: {
    "AWS_ACCESS_KEY_ID": "STORAGE_PROVIDER_AWS_ACCESS_KEY_ID",
    "AWS_SECRET_ACCESS_KEY_ENCODED": "STORAGE_PROVIDER_AWS_SECRET_ACCESS_KEY_ENCODED",
    "OPENAI_API_KEY": "OPENAI_API_KEY",
    "AUTH_TOKEN": "API_AUTH_TOKEN",
  },

  optional: {
    "MACHINE_HEALTH_PORT": "MACHINE_HEALTH_PORT",
    "MACHINE_LOG_LEVEL": "MACHINE_LOG_LEVEL",
  },

  defaults: {
    "WORKER_MAX_CONCURRENT_JOBS": "1",
    "SERVICE_TYPE": "machine",
    "LOG_TO_FILE": "true"
  }
};

Key Features:

  • Variable Mapping: Maps friendly service variable names to namespaced component names
  • Template Substitution: Supports ${VAR} syntax for dynamic composition
  • Categorization: required, secret, optional, defaults enforce correct handling
  • Build-time Validation: Fails early if required variables are missing

Current Service Interfaces:

  • api.interface.ts - Job queue API service
  • machine.interface.ts - Machine container orchestration
  • emprops-api.interface.ts - EmProps platform API
  • database.interface.ts - PostgreSQL database
  • monitor.interface.ts - Monitoring UI
  • webhook-service.interface.ts - Webhook delivery service
  • job-evaluator.interface.ts - Job evaluation service
  • telemetry-collector.interface.ts - Telemetry collection service

3. Environment Profiles (config/environments/profiles/)

Profiles compose components together for specific use cases. They define which component environments to activate.

Structure:

json
{
  "name": "Profile Name",
  "description": "What this profile is for",
  "components": {
    "component-name": ["default", "environment"],
    "another-component": "single-environment"
  },
  "services": {}
}

Example: staging.json

json
{
  "name": "Production Testing",
  "description": "Production-like environment for testing with local Redis",
  "components": {
    "api": ["default", "staging"],
    "redis": ["default", "staging"],
    "machine": ["default", "staging"],
    "worker": ["default", "staging"],
    "database": ["default", "staging"],
    "storage-provider": ["default", "production"],
    "telemetry": ["default", "staging"]
  }
}

Available Profiles:

  • local-dev.json: Local development with production API + local Redis
  • staging.json: Production-like testing environment
  • production.json: Full production configuration
  • remote-dev.json: Remote development with ngrok tunneling
  • testrunner.json: Automated testing environment
  • local-prod.json: Local production simulation

Profile Composition Rules:

  1. Array syntax ["default", "staging"] merges environments in order (later overrides earlier)
  2. String syntax "staging" loads only that environment
  3. All components load from their namespaced .env files
  4. Secrets are automatically loaded from .env.secrets.local (gitignored)

4. Secrets Management (config/environments/secrets/)

Location: config/environments/secrets/.env.secrets.local (gitignored)

Secrets are stored in a single .env file with plain key-value pairs (no namespacing). The build system automatically:

  1. Loads all secrets into the variable pool
  2. Identifies which variables are secrets based on service interfaces
  3. Writes secrets to .env.secret.<profile> files per service
  4. Keeps public variables in .env.<profile> files

Example: .env.secrets.local

bash
# Database
DATABASE_URL=postgresql://user:pass@localhost:5432/db

# Redis
PRODUCTION_REDIS_PASSWORD=secret_password_here

# Cloud Storage
STORAGE_PROVIDER_AWS_ACCESS_KEY_ID=AKIAXXXXXXXX
STORAGE_PROVIDER_AWS_SECRET_ACCESS_KEY_ENCODED=base64_encoded_secret

# API Keys
OPENAI_API_KEY=sk-xxxxxxxx
API_AUTH_TOKEN=bearer_token_here

# Telemetry
TELEMETRY_DASH0_AUTH_TOKEN=dash0_token_here

Security Features:

  • Gitignored: Never committed to repository
  • Auto-detected: Build system knows which vars are secrets via service interfaces
  • Per-service: Each service gets only its required secrets
  • Docker-friendly: .env.secret files mount at runtime, not baked into images

Build Process

Environment Builder (@emp/env-management)

The EnvironmentBuilder class orchestrates the entire build process:

Build Flow:

Key Steps:

  1. Profile Loading: Reads profile JSON to determine which components and environments to load

  2. Component Pool Creation: Loads all specified component files and merges their environments

    typescript
    // Example: api component with ["default", "staging"]
    // Loads: api.env[default] + api.env[staging]
    // Result: API_NODE_ENV=production, API_PORT=3333, API_LOG_LEVEL=info
  3. Secret Loading: Automatically loads .env.secrets.local (no profile declaration needed)

  4. Variable Resolution: Multi-pass resolution of ${VAR} substitutions

    typescript
    // Pass 1: REDIS_URL=redis://default:${PRODUCTION_REDIS_PASSWORD}@host:6379
    // Pass 2: REDIS_URL=redis://default:secret123@host:6379
  5. Service Validation: For each service interface, validates all required and secret variables exist

  6. File Generation: For each service:

    • Maps component variables → service variables via interface
    • Splits into public (.env) and secret (.env.secret) based on interface categorization
    • Writes files to service location (apps/service-name/.env.<profile>)

Example Output:

bash
# After: pnpm env:build staging

# Generated files:
apps/api/.env.staging            # Public API vars (baked into Docker image)
apps/api/.env.secret.staging     # Secret API vars (Docker runtime inject)
apps/machine/.env.staging        # Public machine vars
apps/machine/.env.secret.staging # Secret machine vars (API keys, storage creds)
apps/worker/.env.staging         # Public worker vars
apps/worker/.env.secret.staging  # Secret worker vars
# ... etc for each service

Variable Resolution

The system supports multi-pass variable substitution to handle nested references:

Resolution Algorithm:

typescript
// Maximum 10 passes to resolve all ${VAR} references
while (hasUnresolvedVars && pass < 10) {
  for each variable:
    Replace ${VAR} with:
      1. Value from resolved pool (if exists)
      2. Value from process.env (if exists)
      3. Keep ${VAR} unchanged (will warn at end)
}

Example Multi-layer Resolution:

bash
# Secrets file:
PRODUCTION_REDIS_PASSWORD=secret123

# redis.env [staging]:
REDIS_URL=redis://default:${PRODUCTION_REDIS_PASSWORD}@switchback.proxy.rlwy.net:48889

# machine.interface.ts:
"HUB_REDIS_URL": "REDIS_URL"

# Final apps/machine/.env.staging:
HUB_REDIS_URL=redis://default:secret123@switchback.proxy.rlwy.net:48889

Template Composition:

typescript
// machine.interface.ts uses template composition:
"OTEL_COLLECTOR_ENDPOINT": "${MACHINE_EGRESS_GRPC}${TELEMETRY_COLLECTOR_OTEL_COLLECTOR_ENDPOINT}"

// Resolves from:
MACHINE_EGRESS_GRPC=myhost.com  (from machine.env)
TELEMETRY_COLLECTOR_OTEL_COLLECTOR_ENDPOINT=:4317  (from telemetry-collector.env)

// Final result:
OTEL_COLLECTOR_ENDPOINT=myhost.com:4317

Usage Patterns

Building Environments

CLI Commands:

bash
# Build staging environment (most common)
pnpm env:build staging

# Build production environment
pnpm env:build production

# Build local development environment
pnpm env:build local-dev

# List available profiles
pnpm env:list

What Happens:

  1. Loads profile from config/environments/profiles/<profile>.json
  2. Loads all component .env files specified in profile
  3. Loads secrets from config/environments/secrets/.env.secrets.local
  4. Validates all service interfaces have required variables
  5. Generates .env.<profile> and .env.secret.<profile> for each service
  6. Reports validation results and any missing variables

Development Workflow

Local Development:

bash
# 1. Create your secrets file (one-time setup)
cp config/environments/secrets/.env.secrets.local.example \\
   config/environments/secrets/.env.secrets.local

# 2. Edit secrets with your credentials
vim config/environments/secrets/.env.secrets.local

# 3. Build local-dev environment
pnpm env:build local-dev

# 4. Start services with Docker Compose
docker-compose up

# Services automatically load apps/*/env.local-dev files

Staging Deployment:

bash
# 1. Ensure secrets file has staging credentials
# 2. Build staging environment
pnpm env:build staging

# 3. Deploy to staging (exact process varies by deployment target)
# For Docker:
docker-compose -f docker-compose.staging.yml up

# For Kubernetes:
kubectl apply -f k8s/staging/

Production Deployment:

bash
# 1. Build production environment
pnpm env:build production

# 2. Verify no missing variables
# 3. Deploy (secrets injected via secure CI/CD or secret management service)

Adding a New Service

Step 1: Create Service Interface

File: config/environments/services/my-service.interface.ts

typescript
export const MyServiceEnvInterface = {
  name: "my-service",
  location: "apps/my-service",

  required: {
    "SERVICE_PORT": "MY_SERVICE_PORT",
    "REDIS_URL": "REDIS_URL",
  },

  secret: {
    "API_KEY": "MY_SERVICE_API_KEY",
  },

  defaults: {
    "LOG_LEVEL": "info",
  }
};

Step 2: Add Component Configuration (if needed)

File: config/environments/components/my-service.env

ini
NAMESPACE=MY_SERVICE

[default]
PORT=8080

[local]
PORT=8080

[production]
PORT=80

Step 3: Update Profiles

File: config/environments/profiles/staging.json

json
{
  "components": {
    "my-service": ["default", "staging"],
    ...
  }
}

Step 4: Add Secrets (if needed)

File: config/environments/secrets/.env.secrets.local

bash
MY_SERVICE_API_KEY=secret_key_here

Step 5: Rebuild Environment

bash
pnpm env:build staging

Result: apps/my-service/.env.staging and apps/my-service/.env.secret.staging created

Troubleshooting

Missing Variables Error:

❌ Service 'machine' is missing required variables: REDIS_URL, MACHINE_GPU_MODE

Fix:

  1. Check which profile you're building
  2. Verify component is included in profile
  3. Check component file has correct environment section
  4. Verify namespace matches service interface expectation

Variable Not Resolving:

⚠️ Unresolved variables in HUB_REDIS_URL: ${PRODUCTION_REDIS_PASSWORD}

Fix:

  1. Check .env.secrets.local has the variable
  2. Verify variable name matches exactly (case-sensitive)
  3. Check for circular dependencies in variable references

Service Not Getting Variables:

Error: Cannot find .env.staging file

Fix:

  1. Verify service interface has correct location field
  2. Ensure profile includes required components
  3. Re-run pnpm env:build <profile>

Best Practices

Variable Naming Conventions

  1. Component Variables: Use namespace prefix

    bash
    # Good
    REDIS_URL=...
    API_PORT=...
    
    # Bad (no namespace)
    URL=...
    PORT=...
  2. Secret Variables: Use descriptive names indicating sensitivity

    bash
    # Good
    STORAGE_PROVIDER_AWS_SECRET_ACCESS_KEY_ENCODED=...
    API_AUTH_TOKEN=...
    
    # Bad
    KEY=...
    TOKEN=...
  3. Template Variables: Use clear composition

    typescript
    // Good
    "${SERVICE_EGRESS}${API_INGRESS}"
    
    // Bad (unclear what's being composed)
    "${A}${B}"

Component Organization

  1. Group Related Settings: Keep all Redis settings in redis.env, not scattered across files

  2. Use Environment Sections Appropriately:

    • [default]: Common to all environments
    • [local]: Local development overrides
    • [staging]: Staging-specific (production-like but separate resources)
    • [production]: Production values
  3. Don't Duplicate: Use variable references instead of duplicating values

    ini
    # Good
    [staging]
    URL=${REDIS_URL}
    
    # Bad
    [staging]
    URL=redis://default:${PRODUCTION_REDIS_PASSWORD}@switchback.proxy.rlwy.net:48889
    HOST=switchback.proxy.rlwy.net

Security Practices

  1. Never Commit Secrets: Always use .env.secrets.local (gitignored)

  2. Use Secret Categorization: Mark all sensitive variables in service interfaces

    typescript
    secret: {
      "DATABASE_URL": "DATABASE_URL",  // ✅ Correct
    }
    
    // NOT:
    required: {
      "DATABASE_URL": "DATABASE_URL",  // ❌ Wrong - should be secret
    }
  3. Rotate Credentials: Update .env.secrets.local and rebuild when rotating credentials

  4. Per-service Secrets: Service interfaces ensure each service only gets its required secrets

Profile Design

  1. Composition Over Duplication: Use component layering

    json
    {
      "components": {
        "redis": ["default", "staging"]  // ✅ Merges default + staging
      }
    }
  2. Meaningful Names: Profile names should describe their purpose clearly

    • local-dev - Local development
    • staging - Staging environment
    • production - Production deployment
  3. Document Profiles: Add clear descriptions in profile JSON

Advanced Topics

Custom Environment Variables

Sometimes you need environment-specific values not in component files:

Option 1: Add to component file

ini
# components/custom.env
NAMESPACE=CUSTOM

[local]
SPECIAL_VALUE=local_value

[production]
SPECIAL_VALUE=prod_value

Option 2: Use process.env passthrough

bash
# Set in shell before build
export SPECIAL_VALUE=runtime_value
pnpm env:build staging

# Variable available for ${SPECIAL_VALUE} substitution

Template Composition Patterns

Egress/Ingress Pattern: Used for services that need to know how to reach other services:

typescript
// Service A needs to reach Service B
"SERVICE_B_URL": "${SERVICE_A_EGRESS}${SERVICE_B_INGRESS}"

// Local:
SERVICE_A_EGRESS=http://host.docker.internal  // How A reaches outside
SERVICE_B_INGRESS=:3000                        // Where B listens
// Result: http://host.docker.internal:3000

// Production:
SERVICE_A_EGRESS=https://api.example.com  // Public endpoint
SERVICE_B_INGRESS=/service-b               // Path prefix
// Result: https://api.example.com/service-b

Docker Compose Integration

The builder can generate Docker Compose files from profiles:

json
{
  "docker": {
    "services": {
      "redis": {
        "image": "redis:7",
        "ports": ["6379:6379"],
        "condition": "components.redis includes 'local'"
      }
    }
  }
}

Features:

  • Conditional Services: Only include if component is active
  • Variable Substitution: Use ${VAR} in Docker Compose config
  • Auto-platform: Adds platform: linux/amd64 for cross-architecture builds
  • Machine/Worker Architecture: See machine-worker-system.md for how environments are used in deployment
  • Testing Procedures: See TESTING_PROCEDURES.md for environment-specific testing
  • CLAUDE.md: See project root for development workflow and conventions

Key Takeaways

  1. Component files define what's available, service interfaces define what's required
  2. Profiles compose components for different use cases
  3. Secrets are automatic - just add to .env.secrets.local
  4. Build fails fast - missing variables caught at build time, not runtime
  5. Multi-pass resolution - complex variable chains resolve automatically
  6. Per-service outputs - each service gets exactly what it needs, nothing more

Released under the MIT License.