Skip to content

Queue-Based Autoscaling for Railway Services

Status: Proposed Date: 2025-12-03 Author: System Architecture Scope: Implementing queue-aware autoscaling for emprops-api and job-api services on Railway


Context

Railway does not provide native queue-based autoscaling. While Railway supports manual replica configuration and CPU/memory-based scaling, our job queue system needs scaling based on queue depth and processing latency, not CPU utilization.

Current Deployment Model

┌─────────────────────────────────────────────────────────────────┐
│  Railway Infrastructure                                          │
│  ┌───────────────────────┐    ┌────────────────────────────────┐│
│  │  job-api (1 replica)  │←──→│  emprops-api (1 replica)       ││
│  │  - gRPC server        │    │  - gRPC client                 ││
│  │  - Redis orchestrator │    │  - User-facing API             ││
│  └───────────────────────┘    └────────────────────────────────┘│
│             ↓                                                    │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  Redis (Railway managed)                                   │  │
│  │  - Job queues (pending, processing)                        │  │
│  │  - Event pub/sub                                           │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

The Problem

  1. No Native Queue-Based Scaling: Railway scales on CPU/memory, not queue depth
  2. Slow Container Startup: Current containers take 30-60+ seconds to become healthy
  3. Queue Pressure Undetected: Jobs pile up without triggering scaling
  4. Manual Intervention: Currently requires human action to scale replicas

Container Startup Time Analysis

Based on the Dockerfile and entrypoint analysis, current startup involves:

PhaseTime EstimateBottleneck
Container pull5-10sImage size (~1GB)
OTEL Collector startup15-30sHealth check wait loop
Prisma Client generation10-20sSchema parsing + generation
Node.js application start5-10sModule loading
Total35-70sMultiple sequential waits

Key Bottlenecks Identified:

  1. Prisma generate at runtime (emprops-api) - lines 64-75 of entrypoint
  2. OTEL Collector health check loop - 30 attempts × 0.5s = 15s max
  3. Sequential operations - No parallelism in startup

Decision

Implement a custom queue-based autoscaler using Railway's API combined with Redis queue metrics.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Autoscaler Service (new - runs on cron or dedicated service)   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  1. Query Redis queue depth (LLEN pending queues)         │  │
│  │  2. Query current replica count (Railway API)             │  │
│  │  3. Calculate desired replicas based on thresholds        │  │
│  │  4. Scale via Railway API if needed                       │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
              ↓                                    ↓
┌─────────────────────────────────────────────────────────────────┐
│  Redis                        Railway API                        │
│  - queue:pending:*            - GET /deployments/{id}            │
│  - queue:processing:*         - PATCH /deployments/{id}/scale    │
│  - job wait times             - GET /services/{id}/instances     │
└─────────────────────────────────────────────────────────────────┘

Scaling Thresholds

MetricScale UpScale DownCooldown
Queue Depth> 10 pending jobs< 2 pending jobs5 min
Wait Time> 30s average wait< 5s average wait5 min
Processing Time> 60s for simple jobsN/A10 min

Replica Limits

ServiceMin ReplicasMax ReplicasDefault
job-api131
emprops-api151

Implementation Plan

Phase 1: Startup Optimization (Immediate - Reduces Scaling Lag)

Goal: Reduce container startup time from 35-70s to <15s

1.1 Pre-generate Prisma Client at Build Time

dockerfile
# In apps/emprops-api/Dockerfile - ADD after pnpm install
RUN pnpm prisma generate --schema=".workspace-packages/database/prisma/schema.prisma"

Remove runtime generation from entrypoint (lines 64-75).

1.2 Parallelize OTEL Collector Startup

Current: Sequential wait for OTEL before Node.js starts Proposed: Start OTEL in background, proceed with Node.js immediately

bash
# In entrypoint - Start OTEL without blocking
start_otel_collector &
OTEL_PID=$!

# Start Node.js immediately
exec node --trace-warnings "$app_file"

1.3 Reduce Health Check Timeout

dockerfile
# From 30s start period to 10s (after optimizations)
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

Phase 2: Queue Metrics Collection

File: packages/core/src/redis-functions/queue-metrics.ts

typescript
interface QueueMetrics {
  pendingJobs: number;
  processingJobs: number;
  averageWaitTimeMs: number;
  oldestJobAgeMs: number;
  jobsPerMinute: number;
}

export async function getQueueMetrics(redis: Redis): Promise<QueueMetrics> {
  const [pending, processing] = await Promise.all([
    redis.llen('queue:pending:default'),
    redis.llen('queue:processing'),
  ]);

  // Calculate wait times from job timestamps
  const oldestJob = await redis.lrange('queue:pending:default', -1, -1);
  // ... calculate metrics

  return { pendingJobs: pending, processingJobs: processing, ... };
}

Phase 3: Autoscaler Service

File: apps/autoscaler/src/index.ts (new service)

typescript
import { RailwayClient } from './railway-client';
import { getQueueMetrics } from '@emp/core';

interface ScalingConfig {
  serviceId: string;
  minReplicas: number;
  maxReplicas: number;
  scaleUpThreshold: number;    // queue depth
  scaleDownThreshold: number;  // queue depth
  cooldownMinutes: number;
}

async function evaluateScaling(config: ScalingConfig): Promise<void> {
  const metrics = await getQueueMetrics(redis);
  const currentReplicas = await railway.getReplicaCount(config.serviceId);

  let desiredReplicas = currentReplicas;

  if (metrics.pendingJobs > config.scaleUpThreshold) {
    desiredReplicas = Math.min(currentReplicas + 1, config.maxReplicas);
  } else if (metrics.pendingJobs < config.scaleDownThreshold) {
    desiredReplicas = Math.max(currentReplicas - 1, config.minReplicas);
  }

  if (desiredReplicas !== currentReplicas && !isInCooldown()) {
    await railway.scaleService(config.serviceId, desiredReplicas);
    recordScalingEvent();
  }
}

// Run every 30 seconds
setInterval(() => evaluateScaling(empropsApiConfig), 30_000);

Phase 4: Railway API Integration

File: apps/autoscaler/src/railway-client.ts

typescript
import { GraphQLClient } from 'graphql-request';

export class RailwayClient {
  private client: GraphQLClient;

  constructor(token: string) {
    this.client = new GraphQLClient('https://backboard.railway.app/graphql/v2', {
      headers: { Authorization: `Bearer ${token}` }
    });
  }

  async getReplicaCount(serviceId: string): Promise<number> {
    const query = gql`
      query GetService($serviceId: String!) {
        service(id: $serviceId) {
          deployments(first: 1) {
            edges {
              node {
                staticUrl
                replicas
              }
            }
          }
        }
      }
    `;
    const data = await this.client.request(query, { serviceId });
    return data.service.deployments.edges[0]?.node.replicas ?? 1;
  }

  async scaleService(serviceId: string, replicas: number): Promise<void> {
    const mutation = gql`
      mutation ScaleService($serviceId: String!, $replicas: Int!) {
        serviceInstanceUpdate(
          serviceId: $serviceId
          input: { numReplicas: $replicas }
        ) {
          numReplicas
        }
      }
    `;
    await this.client.request(mutation, { serviceId, replicas });
  }
}

Startup Time Optimization Details

Current Bottleneck: Prisma Generate at Runtime

Location: apps/emprops-api/entrypoint-emprops-api.sh:64-75

bash
# CURRENT (slow) - Generates Prisma Client at every container start
perform_health_check() {
    log_info "Generating Prisma Client from database package schema..."
    if ! "$SERVICE_DIR/node_modules/.bin/prisma" generate \
        --schema="$SERVICE_DIR/.workspace-packages/database/prisma/schema.prisma"; then
        log_error "Failed to generate Prisma Client"
        return 1
    fi
}

Why This Is Slow:

  1. Prisma parses the schema file
  2. Generates TypeScript types
  3. Writes to node_modules/.prisma/client
  4. This happens on every container start

Fix: Generate at Docker build time:

dockerfile
# In Dockerfile - after pnpm install
COPY .workspace-packages/database/prisma/schema.prisma ./schema.prisma
RUN pnpm prisma generate --schema=./schema.prisma

Current Bottleneck: OTEL Collector Sequential Wait

Location: entrypoint-emprops-api.sh:117-155

The OTEL Collector startup blocks Node.js application start with:

  • 30 attempts × 0.5s sleep = up to 15 seconds
  • Both health endpoint AND gRPC port must be ready

Fix: Non-blocking OTEL startup:

bash
start_otel_collector() {
    # Start in background without waiting
    otelcol-contrib --config="$CONFIG" > "${SERVICE_DIR}/logs/otel-collector.log" 2>&1 &
    echo $! > /tmp/otel-collector.pid
    log_info "🚀 OTEL Collector starting in background (PID: $(cat /tmp/otel-collector.pid))"
    # Don't wait - proceed immediately
}

The Node.js TelemetryClient already handles OTEL unavailability gracefully.


Consequences

Benefits

  1. Automatic Scaling: Queue pressure triggers scaling without manual intervention
  2. Faster Startup: 35-70s → <15s with optimizations
  3. Cost Efficiency: Scale down when queue is empty
  4. Visibility: Queue metrics feed into monitoring dashboard

Risks

  1. Railway API Rate Limits: May hit API limits with frequent polling
    • Mitigation: 30-second polling interval, exponential backoff
  2. Scaling Lag: Still 10-15s for new containers to become healthy
    • Mitigation: Predictive scaling based on queue growth rate
  3. Additional Service: Autoscaler needs its own hosting
    • Option: Run as Railway cron job or dedicated service

Not Changing

  • Worker scaling (handled by SALAD/vast.ai)
  • Redis-based job matching logic
  • gRPC communication between services

Metrics to Track

MetricSourceAlert Threshold
Queue depthRedis LLEN> 50 jobs
Job wait timeRedis job timestamps> 60s
Container startup timeRailway deploy logs> 30s
Scale events/hourAutoscaler logs> 10
Failed scale attemptsAutoscaler logs> 0

Alternative Considered

External Autoscaler Services

  • KEDA: Kubernetes-native, doesn't work with Railway
  • AWS Application Auto Scaling: AWS-only
  • Custom Lambda: Could work, but adds AWS dependency

Railway's simplicity is a strength - keeping the autoscaler in our codebase maintains that simplicity while adding queue-awareness.


Dependencies

  • Railway API token with service modification permissions
  • graphql-request for Railway GraphQL API
  • Redis access for queue metrics
  • New apps/autoscaler service or cron job

Environment Variables

bash
# Autoscaler service
RAILWAY_API_TOKEN=xxx              # Railway API token
RAILWAY_EMPROPS_API_SERVICE_ID=xxx # Service ID to scale
RAILWAY_JOB_API_SERVICE_ID=xxx     # Service ID to scale
REDIS_URL=xxx                      # Queue metrics source

# Scaling config
SCALE_UP_THRESHOLD=10              # Jobs to trigger scale up
SCALE_DOWN_THRESHOLD=2             # Jobs to trigger scale down
MIN_REPLICAS=1
MAX_REPLICAS=5
COOLDOWN_MINUTES=5

End of ADR

Released under the MIT License.