Queue-Based Autoscaling for Railway Services
Status: Proposed Date: 2025-12-03 Author: System Architecture Scope: Implementing queue-aware autoscaling for emprops-api and job-api services on Railway
Context
Railway does not provide native queue-based autoscaling. While Railway supports manual replica configuration and CPU/memory-based scaling, our job queue system needs scaling based on queue depth and processing latency, not CPU utilization.
Current Deployment Model
┌─────────────────────────────────────────────────────────────────┐
│ Railway Infrastructure │
│ ┌───────────────────────┐ ┌────────────────────────────────┐│
│ │ job-api (1 replica) │←──→│ emprops-api (1 replica) ││
│ │ - gRPC server │ │ - gRPC client ││
│ │ - Redis orchestrator │ │ - User-facing API ││
│ └───────────────────────┘ └────────────────────────────────┘│
│ ↓ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Redis (Railway managed) │ │
│ │ - Job queues (pending, processing) │ │
│ │ - Event pub/sub │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘The Problem
- No Native Queue-Based Scaling: Railway scales on CPU/memory, not queue depth
- Slow Container Startup: Current containers take 30-60+ seconds to become healthy
- Queue Pressure Undetected: Jobs pile up without triggering scaling
- Manual Intervention: Currently requires human action to scale replicas
Container Startup Time Analysis
Based on the Dockerfile and entrypoint analysis, current startup involves:
| Phase | Time Estimate | Bottleneck |
|---|---|---|
| Container pull | 5-10s | Image size (~1GB) |
| OTEL Collector startup | 15-30s | Health check wait loop |
| Prisma Client generation | 10-20s | Schema parsing + generation |
| Node.js application start | 5-10s | Module loading |
| Total | 35-70s | Multiple sequential waits |
Key Bottlenecks Identified:
- Prisma generate at runtime (emprops-api) - lines 64-75 of entrypoint
- OTEL Collector health check loop - 30 attempts × 0.5s = 15s max
- Sequential operations - No parallelism in startup
Decision
Implement a custom queue-based autoscaler using Railway's API combined with Redis queue metrics.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Autoscaler Service (new - runs on cron or dedicated service) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ 1. Query Redis queue depth (LLEN pending queues) │ │
│ │ 2. Query current replica count (Railway API) │ │
│ │ 3. Calculate desired replicas based on thresholds │ │
│ │ 4. Scale via Railway API if needed │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
↓ ↓
┌─────────────────────────────────────────────────────────────────┐
│ Redis Railway API │
│ - queue:pending:* - GET /deployments/{id} │
│ - queue:processing:* - PATCH /deployments/{id}/scale │
│ - job wait times - GET /services/{id}/instances │
└─────────────────────────────────────────────────────────────────┘Scaling Thresholds
| Metric | Scale Up | Scale Down | Cooldown |
|---|---|---|---|
| Queue Depth | > 10 pending jobs | < 2 pending jobs | 5 min |
| Wait Time | > 30s average wait | < 5s average wait | 5 min |
| Processing Time | > 60s for simple jobs | N/A | 10 min |
Replica Limits
| Service | Min Replicas | Max Replicas | Default |
|---|---|---|---|
| job-api | 1 | 3 | 1 |
| emprops-api | 1 | 5 | 1 |
Implementation Plan
Phase 1: Startup Optimization (Immediate - Reduces Scaling Lag)
Goal: Reduce container startup time from 35-70s to <15s
1.1 Pre-generate Prisma Client at Build Time
# In apps/emprops-api/Dockerfile - ADD after pnpm install
RUN pnpm prisma generate --schema=".workspace-packages/database/prisma/schema.prisma"Remove runtime generation from entrypoint (lines 64-75).
1.2 Parallelize OTEL Collector Startup
Current: Sequential wait for OTEL before Node.js starts Proposed: Start OTEL in background, proceed with Node.js immediately
# In entrypoint - Start OTEL without blocking
start_otel_collector &
OTEL_PID=$!
# Start Node.js immediately
exec node --trace-warnings "$app_file"1.3 Reduce Health Check Timeout
# From 30s start period to 10s (after optimizations)
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1Phase 2: Queue Metrics Collection
File: packages/core/src/redis-functions/queue-metrics.ts
interface QueueMetrics {
pendingJobs: number;
processingJobs: number;
averageWaitTimeMs: number;
oldestJobAgeMs: number;
jobsPerMinute: number;
}
export async function getQueueMetrics(redis: Redis): Promise<QueueMetrics> {
const [pending, processing] = await Promise.all([
redis.llen('queue:pending:default'),
redis.llen('queue:processing'),
]);
// Calculate wait times from job timestamps
const oldestJob = await redis.lrange('queue:pending:default', -1, -1);
// ... calculate metrics
return { pendingJobs: pending, processingJobs: processing, ... };
}Phase 3: Autoscaler Service
File: apps/autoscaler/src/index.ts (new service)
import { RailwayClient } from './railway-client';
import { getQueueMetrics } from '@emp/core';
interface ScalingConfig {
serviceId: string;
minReplicas: number;
maxReplicas: number;
scaleUpThreshold: number; // queue depth
scaleDownThreshold: number; // queue depth
cooldownMinutes: number;
}
async function evaluateScaling(config: ScalingConfig): Promise<void> {
const metrics = await getQueueMetrics(redis);
const currentReplicas = await railway.getReplicaCount(config.serviceId);
let desiredReplicas = currentReplicas;
if (metrics.pendingJobs > config.scaleUpThreshold) {
desiredReplicas = Math.min(currentReplicas + 1, config.maxReplicas);
} else if (metrics.pendingJobs < config.scaleDownThreshold) {
desiredReplicas = Math.max(currentReplicas - 1, config.minReplicas);
}
if (desiredReplicas !== currentReplicas && !isInCooldown()) {
await railway.scaleService(config.serviceId, desiredReplicas);
recordScalingEvent();
}
}
// Run every 30 seconds
setInterval(() => evaluateScaling(empropsApiConfig), 30_000);Phase 4: Railway API Integration
File: apps/autoscaler/src/railway-client.ts
import { GraphQLClient } from 'graphql-request';
export class RailwayClient {
private client: GraphQLClient;
constructor(token: string) {
this.client = new GraphQLClient('https://backboard.railway.app/graphql/v2', {
headers: { Authorization: `Bearer ${token}` }
});
}
async getReplicaCount(serviceId: string): Promise<number> {
const query = gql`
query GetService($serviceId: String!) {
service(id: $serviceId) {
deployments(first: 1) {
edges {
node {
staticUrl
replicas
}
}
}
}
}
`;
const data = await this.client.request(query, { serviceId });
return data.service.deployments.edges[0]?.node.replicas ?? 1;
}
async scaleService(serviceId: string, replicas: number): Promise<void> {
const mutation = gql`
mutation ScaleService($serviceId: String!, $replicas: Int!) {
serviceInstanceUpdate(
serviceId: $serviceId
input: { numReplicas: $replicas }
) {
numReplicas
}
}
`;
await this.client.request(mutation, { serviceId, replicas });
}
}Startup Time Optimization Details
Current Bottleneck: Prisma Generate at Runtime
Location: apps/emprops-api/entrypoint-emprops-api.sh:64-75
# CURRENT (slow) - Generates Prisma Client at every container start
perform_health_check() {
log_info "Generating Prisma Client from database package schema..."
if ! "$SERVICE_DIR/node_modules/.bin/prisma" generate \
--schema="$SERVICE_DIR/.workspace-packages/database/prisma/schema.prisma"; then
log_error "Failed to generate Prisma Client"
return 1
fi
}Why This Is Slow:
- Prisma parses the schema file
- Generates TypeScript types
- Writes to
node_modules/.prisma/client - This happens on every container start
Fix: Generate at Docker build time:
# In Dockerfile - after pnpm install
COPY .workspace-packages/database/prisma/schema.prisma ./schema.prisma
RUN pnpm prisma generate --schema=./schema.prismaCurrent Bottleneck: OTEL Collector Sequential Wait
Location: entrypoint-emprops-api.sh:117-155
The OTEL Collector startup blocks Node.js application start with:
- 30 attempts × 0.5s sleep = up to 15 seconds
- Both health endpoint AND gRPC port must be ready
Fix: Non-blocking OTEL startup:
start_otel_collector() {
# Start in background without waiting
otelcol-contrib --config="$CONFIG" > "${SERVICE_DIR}/logs/otel-collector.log" 2>&1 &
echo $! > /tmp/otel-collector.pid
log_info "🚀 OTEL Collector starting in background (PID: $(cat /tmp/otel-collector.pid))"
# Don't wait - proceed immediately
}The Node.js TelemetryClient already handles OTEL unavailability gracefully.
Consequences
Benefits
- Automatic Scaling: Queue pressure triggers scaling without manual intervention
- Faster Startup: 35-70s → <15s with optimizations
- Cost Efficiency: Scale down when queue is empty
- Visibility: Queue metrics feed into monitoring dashboard
Risks
- Railway API Rate Limits: May hit API limits with frequent polling
- Mitigation: 30-second polling interval, exponential backoff
- Scaling Lag: Still 10-15s for new containers to become healthy
- Mitigation: Predictive scaling based on queue growth rate
- Additional Service: Autoscaler needs its own hosting
- Option: Run as Railway cron job or dedicated service
Not Changing
- Worker scaling (handled by SALAD/vast.ai)
- Redis-based job matching logic
- gRPC communication between services
Metrics to Track
| Metric | Source | Alert Threshold |
|---|---|---|
| Queue depth | Redis LLEN | > 50 jobs |
| Job wait time | Redis job timestamps | > 60s |
| Container startup time | Railway deploy logs | > 30s |
| Scale events/hour | Autoscaler logs | > 10 |
| Failed scale attempts | Autoscaler logs | > 0 |
Alternative Considered
External Autoscaler Services
- KEDA: Kubernetes-native, doesn't work with Railway
- AWS Application Auto Scaling: AWS-only
- Custom Lambda: Could work, but adds AWS dependency
Railway's simplicity is a strength - keeping the autoscaler in our codebase maintains that simplicity while adding queue-awareness.
Dependencies
- Railway API token with service modification permissions
graphql-requestfor Railway GraphQL API- Redis access for queue metrics
- New
apps/autoscalerservice or cron job
Environment Variables
# Autoscaler service
RAILWAY_API_TOKEN=xxx # Railway API token
RAILWAY_EMPROPS_API_SERVICE_ID=xxx # Service ID to scale
RAILWAY_JOB_API_SERVICE_ID=xxx # Service ID to scale
REDIS_URL=xxx # Queue metrics source
# Scaling config
SCALE_UP_THRESHOLD=10 # Jobs to trigger scale up
SCALE_DOWN_THRESHOLD=2 # Jobs to trigger scale down
MIN_REPLICAS=1
MAX_REPLICAS=5
COOLDOWN_MINUTES=5End of ADR
