Queue-Based Autoscaling for Railway Services

Status: Proposed Date: 2025-12-03 Author: System Architecture Scope: Implementing queue-aware autoscaling for emprops-api and job-api services on Railway

Context

Railway does not provide native queue-based autoscaling. While Railway supports manual replica configuration and CPU/memory-based scaling, our job queue system needs scaling based on queue depth and processing latency, not CPU utilization.

Current Deployment Model

┌─────────────────────────────────────────────────────────────────┐
│  Railway Infrastructure                                          │
│  ┌───────────────────────┐    ┌────────────────────────────────┐│
│  │  job-api (1 replica)  │←──→│  emprops-api (1 replica)       ││
│  │  - gRPC server        │    │  - gRPC client                 ││
│  │  - Redis orchestrator │    │  - User-facing API             ││
│  └───────────────────────┘    └────────────────────────────────┘│
│             ↓                                                    │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  Redis (Railway managed)                                   │  │
│  │  - Job queues (pending, processing)                        │  │
│  │  - Event pub/sub                                           │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

The Problem

No Native Queue-Based Scaling: Railway scales on CPU/memory, not queue depth
Slow Container Startup: Current containers take 30-60+ seconds to become healthy
Queue Pressure Undetected: Jobs pile up without triggering scaling
Manual Intervention: Currently requires human action to scale replicas

Container Startup Time Analysis

Based on the Dockerfile and entrypoint analysis, current startup involves:

Phase	Time Estimate	Bottleneck
Container pull	5-10s	Image size (~1GB)
OTEL Collector startup	15-30s	Health check wait loop
Prisma Client generation	10-20s	Schema parsing + generation
Node.js application start	5-10s	Module loading
Total	35-70s	Multiple sequential waits

Key Bottlenecks Identified:

Prisma generate at runtime (emprops-api) - lines 64-75 of entrypoint
OTEL Collector health check loop - 30 attempts × 0.5s = 15s max
Sequential operations - No parallelism in startup

Decision

Implement a custom queue-based autoscaler using Railway's API combined with Redis queue metrics.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Autoscaler Service (new - runs on cron or dedicated service)   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  1. Query Redis queue depth (LLEN pending queues)         │  │
│  │  2. Query current replica count (Railway API)             │  │
│  │  3. Calculate desired replicas based on thresholds        │  │
│  │  4. Scale via Railway API if needed                       │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
              ↓                                    ↓
┌─────────────────────────────────────────────────────────────────┐
│  Redis                        Railway API                        │
│  - queue:pending:*            - GET /deployments/{id}            │
│  - queue:processing:*         - PATCH /deployments/{id}/scale    │
│  - job wait times             - GET /services/{id}/instances     │
└─────────────────────────────────────────────────────────────────┘

Scaling Thresholds

Metric	Scale Up	Scale Down	Cooldown
Queue Depth	> 10 pending jobs	< 2 pending jobs	5 min
Wait Time	> 30s average wait	< 5s average wait	5 min
Processing Time	> 60s for simple jobs	N/A	10 min

Replica Limits

Service	Min Replicas	Max Replicas	Default
job-api	1	3	1
emprops-api	1	5	1

Implementation Plan

Phase 1: Startup Optimization (Immediate - Reduces Scaling Lag)

Goal: Reduce container startup time from 35-70s to <15s

1.1 Pre-generate Prisma Client at Build Time

dockerfile

# In apps/emprops-api/Dockerfile - ADD after pnpm install
RUN pnpm prisma generate --schema=".workspace-packages/database/prisma/schema.prisma"

Remove runtime generation from entrypoint (lines 64-75).

1.2 Parallelize OTEL Collector Startup

Current: Sequential wait for OTEL before Node.js starts Proposed: Start OTEL in background, proceed with Node.js immediately

bash

# In entrypoint - Start OTEL without blocking
start_otel_collector &
OTEL_PID=$!

# Start Node.js immediately
exec node --trace-warnings "$app_file"

1.3 Reduce Health Check Timeout

dockerfile

# From 30s start period to 10s (after optimizations)
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

Phase 2: Queue Metrics Collection

File: packages/core/src/redis-functions/queue-metrics.ts

typescript

interface QueueMetrics {
  pendingJobs: number;
  processingJobs: number;
  averageWaitTimeMs: number;
  oldestJobAgeMs: number;
  jobsPerMinute: number;
}

export async function getQueueMetrics(redis: Redis): Promise<QueueMetrics> {
  const [pending, processing] = await Promise.all([
    redis.llen('queue:pending:default'),
    redis.llen('queue:processing'),
  ]);

  // Calculate wait times from job timestamps
  const oldestJob = await redis.lrange('queue:pending:default', -1, -1);
  // ... calculate metrics

  return { pendingJobs: pending, processingJobs: processing, ... };
}

Phase 3: Autoscaler Service

File: apps/autoscaler/src/index.ts (new service)

typescript

import { RailwayClient } from './railway-client';
import { getQueueMetrics } from '@emp/core';

interface ScalingConfig {
  serviceId: string;
  minReplicas: number;
  maxReplicas: number;
  scaleUpThreshold: number;    // queue depth
  scaleDownThreshold: number;  // queue depth
  cooldownMinutes: number;
}

async function evaluateScaling(config: ScalingConfig): Promise<void> {
  const metrics = await getQueueMetrics(redis);
  const currentReplicas = await railway.getReplicaCount(config.serviceId);

  let desiredReplicas = currentReplicas;

  if (metrics.pendingJobs > config.scaleUpThreshold) {
    desiredReplicas = Math.min(currentReplicas + 1, config.maxReplicas);
  } else if (metrics.pendingJobs < config.scaleDownThreshold) {
    desiredReplicas = Math.max(currentReplicas - 1, config.minReplicas);
  }

  if (desiredReplicas !== currentReplicas && !isInCooldown()) {
    await railway.scaleService(config.serviceId, desiredReplicas);
    recordScalingEvent();
  }
}

// Run every 30 seconds
setInterval(() => evaluateScaling(empropsApiConfig), 30_000);

Phase 4: Railway API Integration

File: apps/autoscaler/src/railway-client.ts

typescript

import { GraphQLClient } from 'graphql-request';

export class RailwayClient {
  private client: GraphQLClient;

  constructor(token: string) {
    this.client = new GraphQLClient('https://backboard.railway.app/graphql/v2', {
      headers: { Authorization: `Bearer ${token}` }
    });
  }

  async getReplicaCount(serviceId: string): Promise<number> {
    const query = gql`
      query GetService($serviceId: String!) {
        service(id: $serviceId) {
          deployments(first: 1) {
            edges {
              node {
                staticUrl
                replicas
              }
            }
          }
        }
      }
    `;
    const data = await this.client.request(query, { serviceId });
    return data.service.deployments.edges[0]?.node.replicas ?? 1;
  }

  async scaleService(serviceId: string, replicas: number): Promise<void> {
    const mutation = gql`
      mutation ScaleService($serviceId: String!, $replicas: Int!) {
        serviceInstanceUpdate(
          serviceId: $serviceId
          input: { numReplicas: $replicas }
        ) {
          numReplicas
        }
      }
    `;
    await this.client.request(mutation, { serviceId, replicas });
  }
}

Startup Time Optimization Details

Current Bottleneck: Prisma Generate at Runtime

Location: apps/emprops-api/entrypoint-emprops-api.sh:64-75

bash

# CURRENT (slow) - Generates Prisma Client at every container start
perform_health_check() {
    log_info "Generating Prisma Client from database package schema..."
    if ! "$SERVICE_DIR/node_modules/.bin/prisma" generate \
        --schema="$SERVICE_DIR/.workspace-packages/database/prisma/schema.prisma"; then
        log_error "Failed to generate Prisma Client"
        return 1
    fi
}

Why This Is Slow:

Prisma parses the schema file
Generates TypeScript types
Writes to node_modules/.prisma/client
This happens on every container start

Fix: Generate at Docker build time:

dockerfile

# In Dockerfile - after pnpm install
COPY .workspace-packages/database/prisma/schema.prisma ./schema.prisma
RUN pnpm prisma generate --schema=./schema.prisma

Current Bottleneck: OTEL Collector Sequential Wait

Location: entrypoint-emprops-api.sh:117-155

The OTEL Collector startup blocks Node.js application start with:

30 attempts × 0.5s sleep = up to 15 seconds
Both health endpoint AND gRPC port must be ready

Fix: Non-blocking OTEL startup:

bash

start_otel_collector() {
    # Start in background without waiting
    otelcol-contrib --config="$CONFIG" > "${SERVICE_DIR}/logs/otel-collector.log" 2>&1 &
    echo $! > /tmp/otel-collector.pid
    log_info "🚀 OTEL Collector starting in background (PID: $(cat /tmp/otel-collector.pid))"
    # Don't wait - proceed immediately
}

The Node.js TelemetryClient already handles OTEL unavailability gracefully.

Consequences

Benefits

Automatic Scaling: Queue pressure triggers scaling without manual intervention
Faster Startup: 35-70s → <15s with optimizations
Cost Efficiency: Scale down when queue is empty
Visibility: Queue metrics feed into monitoring dashboard

Risks

Railway API Rate Limits: May hit API limits with frequent polling
- Mitigation: 30-second polling interval, exponential backoff
Scaling Lag: Still 10-15s for new containers to become healthy
- Mitigation: Predictive scaling based on queue growth rate
Additional Service: Autoscaler needs its own hosting
- Option: Run as Railway cron job or dedicated service

Not Changing

Worker scaling (handled by SALAD/vast.ai)
Redis-based job matching logic
gRPC communication between services

Metrics to Track

Metric	Source	Alert Threshold
Queue depth	Redis LLEN	> 50 jobs
Job wait time	Redis job timestamps	> 60s
Container startup time	Railway deploy logs	> 30s
Scale events/hour	Autoscaler logs	> 10
Failed scale attempts	Autoscaler logs	> 0

Alternative Considered

External Autoscaler Services

KEDA: Kubernetes-native, doesn't work with Railway
AWS Application Auto Scaling: AWS-only
Custom Lambda: Could work, but adds AWS dependency

Railway's simplicity is a strength - keeping the autoscaler in our codebase maintains that simplicity while adding queue-awareness.

Dependencies

Railway API token with service modification permissions
graphql-request for Railway GraphQL API
Redis access for queue metrics
New apps/autoscaler service or cron job

Environment Variables

bash

# Autoscaler service
RAILWAY_API_TOKEN=xxx              # Railway API token
RAILWAY_EMPROPS_API_SERVICE_ID=xxx # Service ID to scale
RAILWAY_JOB_API_SERVICE_ID=xxx     # Service ID to scale
REDIS_URL=xxx                      # Queue metrics source

# Scaling config
SCALE_UP_THRESHOLD=10              # Jobs to trigger scale up
SCALE_DOWN_THRESHOLD=2             # Jobs to trigger scale down
MIN_REPLICAS=1
MAX_REPLICAS=5
COOLDOWN_MINUTES=5

End of ADR

Queue-Based Autoscaling for Railway Services ​

Context ​

Current Deployment Model ​

The Problem ​

Container Startup Time Analysis ​

Decision ​

Architecture ​

Scaling Thresholds ​

Replica Limits ​

Implementation Plan ​

Phase 1: Startup Optimization (Immediate - Reduces Scaling Lag) ​

1.1 Pre-generate Prisma Client at Build Time ​

1.2 Parallelize OTEL Collector Startup ​

1.3 Reduce Health Check Timeout ​

Phase 2: Queue Metrics Collection ​

Phase 3: Autoscaler Service ​

Phase 4: Railway API Integration ​

Startup Time Optimization Details ​

Current Bottleneck: Prisma Generate at Runtime ​

Current Bottleneck: OTEL Collector Sequential Wait ​

Consequences ​

Benefits ​

Risks ​

Not Changing ​

Metrics to Track ​

Alternative Considered ​

External Autoscaler Services ​

Dependencies ​

Environment Variables ​

Queue-Based Autoscaling for Railway Services

Context

Current Deployment Model

The Problem

Container Startup Time Analysis

Decision

Architecture

Scaling Thresholds

Replica Limits

Implementation Plan

Phase 1: Startup Optimization (Immediate - Reduces Scaling Lag)

1.1 Pre-generate Prisma Client at Build Time

1.2 Parallelize OTEL Collector Startup

1.3 Reduce Health Check Timeout

Phase 2: Queue Metrics Collection

Phase 3: Autoscaler Service

Phase 4: Railway API Integration

Startup Time Optimization Details

Current Bottleneck: Prisma Generate at Runtime

Current Bottleneck: OTEL Collector Sequential Wait

Consequences

Benefits

Risks

Not Changing

Metrics to Track

Alternative Considered

External Autoscaler Services

Dependencies

Environment Variables