ADR-010: LoRA User Storage and Affinity Routing
Date: 2025-11-14 Status: 🤔 Proposed Decision Makers: Engineering Team Related Systems: Job Broker, ComfyUI Workers, Model Cache, Azure Blob Storage
Executive Summary
Enable users to upload and store custom LoRA models in their personal Azure Blob Storage, with intelligent just-in-time downloading and cache-aware job routing. This ADR combines user storage infrastructure with affinity-based job claiming to minimize model download times and improve job execution performance.
Key Capabilities:
- User-owned LoRA storage in Azure Blob Storage
- Just-in-time LoRA downloads on worker machines
- Intelligent LRU + time-based cache eviction (50GB reserved, 7-day TTL)
- Affinity-based job routing (prefer workers with cached LoRAs)
- Non-blocking fallback to workers without cached models
North Star Alignment:
- ✅ Supports predictive model management (Phase 2 goal)
- ✅ Eliminates first-user wait times for popular LoRAs
- ✅ Advances toward specialized machine pools
- ✅ Improves job execution performance through cache-aware routing
Table of Contents
- Context
- Decision
- Architecture Design
- Implementation Specification
- Consequences
- Alternatives Considered
- Success Metrics
- Implementation Phases
Context
Current State
Existing LoRA Infrastructure:
EmProps_Lora_Loadercustom node with Azure/AWS/GCS support- SQLite
model_cache.dbtracking model usage with LRU eviction is_ignoreflag preventing eviction of system LoRAs- Azure Blob Storage handlers for cloud downloads
Current Limitations:
- No user-owned LoRA storage capability
- All workers download LoRAs independently (no affinity routing)
- Redis job matching uses FIFO order (ignores cache state)
- First-user wait times for popular LoRAs (2-5 minutes)
Problem Statement
User Storage Problem: Users cannot upload and manage their own LoRA models. Current system only supports shared/system LoRAs baked into containers or downloaded from shared storage.
Performance Problem: When 3 workers can claim a job requiring a LoRA:
- Worker A has the LoRA cached (ready in <1 second)
- Worker B doesn't have it cached (5 minute download)
- Worker C doesn't have it cached (5 minute download)
Current FIFO matching might assign to Worker B or C, causing unnecessary wait times.
Infrastructure Constraints:
- Ephemeral machines with no shared storage
- 50GB reserved for LoRA cache per machine
- Need to balance cache utilization vs. disk space
- Must work with existing
flat_filetable for user assets
Decision
We will implement a two-tier LoRA storage system with affinity-based job routing:
Part 1: User Storage Infrastructure
Storage Architecture:
- Use existing
flat_filetable for user LoRA metadata - Store LoRA files in Azure Blob Storage (
user-lorascontainer) - Tag flat_file entries with
tags=['lora']for identification - Leverage existing Azure handlers and model cache database
Cache Management:
- Reserve 50GB per machine for LoRA cache
- LRU eviction when cache fills (existing mechanism)
- Time-based cleanup after 7 days of inactivity
- Preserve system LoRAs via
is_ignore=trueflag
Part 2: Affinity-Based Job Routing
Scoring Algorithm:
- Workers report cached LoRAs in capabilities
- Redis function scores each worker-job match
- Higher score = better match (prefer cached models)
- Non-blocking: workers without cache can still claim jobs
Scoring Rules:
-- Scoring weights (configurable)
USER_LORA_MATCH_SCORE = 10 -- User's custom LoRA already cached
SHARED_LORA_MATCH_SCORE = 5 -- Shared LoRA already cached
BASE_SCORE = 0 -- No cache match, download requiredArchitecture Design
High-Level Architecture
┌─────────────────┐
│ User Uploads │
│ LoRA via API │
└────────┬────────┘
│
v
┌─────────────────────────────────────────────────┐
│ flat_file Table (PostgreSQL) │
│ ┌─────────────────────────────────────────┐ │
│ │ id: uuid │ │
│ │ user_id: uuid │ │
│ │ url: https://emprops.blob... │ │
│ │ tags: ['lora'] │ │
│ │ metadata: {original_name, size_bytes} │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
│
│ Referenced in workflow
v
┌───────────────────────┐
│ Job Requirements │
│ {loras: [{ │
│ type: 'user', │
│ flat_file_id: uuid │
│ }]} │
└───────────┬───────────┘
│
v
┌─────────────────────────────────────┐
│ Redis: findMatchingJob() │
│ │
│ 1. Extract LoRA requirements │
│ 2. Score each worker: │
│ - User LoRA cached: +10 points │
│ - Shared LoRA cached: +5 points │
│ 3. Return highest scoring match │
└───────────┬─────────────────────────┘
│
v
┌─────────────────────────┐
│ Worker Claims Job │
└───────────┬─────────────┘
│
v
┌─────────────────────────────────────┐
│ EmProps_Lora_Loader Node │
│ │
│ 1. Check local cache │
│ 2. If not cached: │
│ - Get flat_file metadata │
│ - Download from Azure │
│ - Register in model_cache.db │
│ 3. Load LoRA into ComfyUI │
└─────────────────────────────────────┘
│
v
┌─────────────────────────┐
│ model_cache.db │
│ (SQLite) │
│ │
│ - Track usage │
│ - LRU eviction │
│ - 7-day TTL cleanup │
│ - is_ignore protection │
└─────────────────────────┘Data Flow
Upload Flow
User → API → flat_file table → Azure Blob Storage
│
└─> Tags: ['lora']
Metadata: {original_name, size_bytes}Job Execution Flow
1. Worker calls Redis: FCALL findMatchingJob({
worker_id: "worker-123",
cached_loras: {
user_loras: ["uuid-1", "uuid-2"], // flat_file IDs
shared_loras: ["model-a.safetensors"] // filenames
}
})
2. Redis scores matches:
Job requires: uuid-1 (user LoRA)
Worker A: Has uuid-1 cached → Score: 10 ✅
Worker B: No uuid-1 → Score: 0
Worker C: No uuid-1 → Score: 0
→ Worker A claims job
3. Worker executes:
- EmProps_Lora_Loader checks cache
- LoRA already present → immediate load
- Job starts in <1 secondDownload Flow (Cache Miss)
1. EmProps_Lora_Loader: LoRA not in cache
2. Query flat_file table for metadata
3. Download from Azure Blob Storage
4. Register in model_cache.db (is_ignore=false)
5. Load LoRA into ComfyUI
6. Future jobs on this worker get Score: +10Implementation Specification
1. Database Schema
No schema changes required - use existing flat_file table:
-- Existing flat_file table (no changes needed)
CREATE TABLE flat_file (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
url TEXT NOT NULL,
tags TEXT[] DEFAULT '{}',
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Query user LoRAs
SELECT * FROM flat_file
WHERE user_id = $1
AND 'lora' = ANY(tags);2. API Endpoints
New Endpoints:
// Upload user LoRA
POST /api/user-loras/upload
Body: multipart/form-data { file: File }
Response: {
flat_file_id: string;
url: string;
size_bytes: number;
original_name: string;
}
// List user LoRAs
GET /api/user-loras
Query: { user_id: string }
Response: {
loras: Array<{
id: string;
original_name: string;
size_bytes: number;
created_at: string;
}>
}
// Delete user LoRA
DELETE /api/user-loras/:id
Response: { success: boolean }3. Worker Changes
File: apps/worker/src/redis-direct-worker-client.ts
import { ModelCacheDB } from '@emp/comfyui-custom-nodes/db/model_cache';
// New helper function
async function getCachedLoRAs(): Promise<{
user_loras: string[];
shared_loras: string[];
}> {
const model_cache_db = new ModelCacheDB();
const models = await model_cache_db.get_all_models();
// Extract cached LoRAs
const user_loras = models
.filter(m => m.model_type === 'user_lora')
.map(m => extractFlatFileIdFromPath(m.path));
const shared_loras = models
.filter(m => m.model_type === 'lora' && m.is_ignore === false)
.map(m => m.filename);
return { user_loras, shared_loras };
}
// Helper to extract flat_file ID from path
function extractFlatFileIdFromPath(path: string): string {
// Path format: /workspace/ComfyUI/models/loras/user/{flat_file_id}.safetensors
const match = path.match(/user\/([a-f0-9-]+)\.safetensors$/);
return match ? match[1] : '';
}
// Modified: callFindMatchingJob (around line 499)
async callFindMatchingJob(capabilities: WorkerCapabilities) {
// Add cached LoRAs to capabilities
const cached_loras = await getCachedLoRAs();
const enhancedCapabilities = {
...capabilities,
cached_loras
};
const result = await this.redis.call(
'FCALL',
'findMatchingJob',
0,
JSON.stringify(enhancedCapabilities),
'100' // max_scan
);
return result;
}4. Type Changes
File: packages/core/src/types/worker.ts
export interface WorkerCapabilities {
worker_id: string;
job_service_required_map: string[];
hardware?: {
gpu_memory_gb?: number;
cpu_cores?: number;
ram_gb?: number;
};
models?: Record<string, string[]>;
customer_access?: {
isolation?: 'strict' | 'shared';
allowed_customers?: string[];
denied_customers?: string[];
};
workflow_id?: string;
// NEW: Cached LoRA tracking
cached_loras?: {
user_loras?: string[]; // Array of flat_file IDs
shared_loras?: string[]; // Array of filenames
};
}5. Redis Lua Function Changes
File: packages/core/src/redis-functions/functions/findMatchingJob.lua
-- NEW: Calculate affinity score based on cached LoRAs
local function calculate_affinity_score(worker, job)
local score = 0
-- Parse job requirements to extract LoRA requirements
local requirements = {}
if job.requirements and job.requirements ~= '' then
local success, parsed = pcall(cjson.decode, job.requirements)
if success then
requirements = parsed
end
end
-- Check if job requires any LoRAs
local required_loras = requirements.loras or {}
if #required_loras == 0 then
return 0 -- No LoRAs required, no affinity bonus
end
-- Get worker's cached LoRAs
local cached_loras = worker.cached_loras or {}
local user_loras = cached_loras.user_loras or {}
local shared_loras = cached_loras.shared_loras or {}
-- Score each required LoRA
for _, required_lora in ipairs(required_loras) do
if required_lora.type == 'user' then
-- Check if worker has this user LoRA cached
for _, cached_id in ipairs(user_loras) do
if cached_id == required_lora.flat_file_id then
score = score + 10 -- USER_LORA_MATCH_SCORE
break
end
end
elseif required_lora.type == 'shared' then
-- Check if worker has this shared LoRA cached
for _, cached_name in ipairs(shared_loras) do
if cached_name == required_lora.filename then
score = score + 5 -- SHARED_LORA_MATCH_SCORE
break
end
end
end
end
return score
end
-- MODIFIED: Main function (line 340)
redis.register_function('findMatchingJob', function(keys, args)
local worker_caps_json = args[1]
local max_scan = tonumber(args[2]) or 100
local worker_caps = cjson.decode(worker_caps_json)
redis.log(redis.LOG_NOTICE, 'Worker ' .. worker_caps.worker_id .. ' requesting job')
local pending_jobs = redis.call('ZREVRANGE', 'jobs:pending', '0', tostring(max_scan - 1))
if not pending_jobs or #pending_jobs == 0 then
return nil
end
-- NEW: Track best match
local best_job_id = nil
local best_score = -1
local best_job = nil
-- Check each job for compatibility AND affinity
for i = 1, #pending_jobs do
local job_id = pending_jobs[i]
local job_data = redis.call('HGETALL', 'job:' .. job_id)
if job_data and #job_data > 0 then
local job = hash_to_table(job_data)
-- Check if worker meets requirements
if matches_requirements(worker_caps, job) then
-- Calculate affinity score
local affinity_score = calculate_affinity_score(worker_caps, job)
redis.log(redis.LOG_DEBUG, 'Job ' .. job_id .. ' affinity score: ' .. affinity_score)
-- Track best match
if affinity_score > best_score then
best_score = affinity_score
best_job_id = job_id
best_job = job
end
end
end
end
-- If we found a match, claim the best one
if best_job_id then
local removed = redis.call('ZREM', 'jobs:pending', best_job_id)
if removed == 1 then
-- Update job status (existing code)
redis.call('HMSET', 'job:' .. best_job_id,
'status', 'assigned',
'worker_id', worker_caps.worker_id,
'assigned_at', get_iso_timestamp()
)
-- Add to worker's active jobs
redis.call('HSET', 'jobs:active:' .. worker_caps.worker_id, best_job_id, cjson.encode(best_job))
-- Update worker status
redis.call('HMSET', 'worker:' .. worker_caps.worker_id,
'status', 'busy',
'current_job_id', best_job_id,
'last_status_change', get_iso_timestamp()
)
-- Publish events (existing code)
redis.call('PUBLISH', 'job_claimed', cjson.encode({
job_id = best_job_id,
worker_id = worker_caps.worker_id,
status = 'claimed',
affinity_score = best_score, -- NEW: Include score in event
timestamp = tonumber(redis.call('TIME')[1]) * 1000
}))
redis.log(redis.LOG_NOTICE, 'Worker ' .. worker_caps.worker_id .. ' claimed job ' .. best_job_id .. ' (score: ' .. best_score .. ')')
return cjson.encode({
jobId = best_job_id,
job = best_job
})
end
end
return nil
end)6. ComfyUI Loader Changes
File: packages/comfyui-custom-nodes/emprops_comfy_nodes/nodes/emprops_lora_loader.py
# NEW: Support for user LoRAs
def download_from_cloud(self, lora_name, provider=None, bucket=None, flat_file_id=None):
"""Download LoRA from cloud storage if not found locally"""
# NEW: Handle user LoRA downloads
if flat_file_id:
# User LoRA - download to user-specific directory
local_path = self._get_user_lora_path(flat_file_id)
# Check cache first
if os.path.exists(local_path):
print(f"[EmProps] User LoRA already cached: {local_path}")
model_cache_db.update_model_usage(local_path)
return local_path
# Fetch flat_file metadata from API
metadata = self._fetch_flat_file_metadata(flat_file_id)
if not metadata:
print(f"[EmProps] Failed to fetch metadata for flat_file: {flat_file_id}")
return None
# Download from Azure Blob Storage
handler = AzureHandler(container_name='user-loras')
success, error = handler.download_file(
blob_path=metadata['blob_path'],
local_path=local_path
)
if not success:
print(f"[EmProps] Failed to download user LoRA: {error}")
return None
# Register in model cache (is_ignore=False for user LoRAs)
model_cache_db.register_model(
path=local_path,
model_type='user_lora',
size_bytes=os.path.getsize(local_path),
is_ignore=False # User LoRAs can be evicted
)
return local_path
# Existing shared LoRA logic
# ... (unchanged)
def _get_user_lora_path(self, flat_file_id):
"""Get local path for user LoRA"""
lora_paths = folder_paths.folder_names_and_paths["loras"][0]
user_dir = os.path.join(lora_paths[0], 'user')
os.makedirs(user_dir, exist_ok=True)
return os.path.join(user_dir, f"{flat_file_id}.safetensors")
def _fetch_flat_file_metadata(self, flat_file_id):
"""Fetch flat_file metadata from API"""
api_url = os.getenv('EMP_API_URL', 'http://localhost:3001')
response = requests.get(f"{api_url}/api/flat-files/{flat_file_id}")
if response.status_code == 200:
return response.json()
return NoneConsequences
Positive Consequences ✅
User Empowerment
- Users can upload and manage custom LoRAs
- No dependency on shared model repositories
- Full control over model lifecycle
Performance Optimization
- Jobs route to workers with cached models
- Eliminates redundant downloads across workers
- Reduces job start latency by 2-5 minutes (when cache hits)
Cache Efficiency
- Popular LoRAs naturally accumulate high scores
- LRU + time-based eviction keeps cache fresh
- 50GB dedicated space prevents disk exhaustion
Non-Blocking Design
- Workers without cache can still claim jobs
- System degrades gracefully under load
- No hard dependencies on cache state
North Star Alignment
- Advances Phase 2: Model Intelligence goals
- Supports predictive model placement strategy
- Foundation for pool-specific model baking
Negative Consequences ❌
Complexity Increase
- Redis function logic becomes more sophisticated
- Additional API endpoints for LoRA management
- Worker needs to query cache state before claiming
Storage Costs
- Azure Blob Storage costs for user LoRAs
- 50GB per machine reserved for cache
- Potential for cache thrashing with diverse workloads
Cold Start Problem
- First user of a LoRA still experiences download wait
- Cache warmup period before affinity benefits appear
- May need pre-warming strategies for popular models
Monitoring Complexity
- Need to track cache hit rates per LoRA
- Affinity score distribution monitoring
- Cache eviction analytics required
Alternatives Considered
Alternative 1: Priority Queue Approach
Design:
- Two-pass matching: first try workers with cache, then fallback
- Maintain separate priority queues for jobs
Rejected Because:
- More complex than scoring approach
- Harder to extend for multi-LoRA jobs
- Doesn't handle partial cache matches well
Alternative 2: S3/Shared Storage
Design:
- Use AWS S3 instead of Azure
- Shared storage accessible by all workers
Rejected Because:
- EmProps uses Azure, not AWS
- Vendor lock-in concerns
- Azure Blob Storage already integrated
Alternative 3: Pre-baked Container Images
Design:
- Bake popular LoRAs into container images
- Different images for different LoRA sets
Rejected Because:
- Not feasible for user-owned LoRAs
- Container images become massive (50GB+)
- Deployment time increases dramatically
- Note: May complement this approach in Phase 2
Alternative 4: Centralized Cache Service
Design:
- Single cache service shared across workers
- Workers fetch from cache service, not Azure
Rejected Because:
- Single point of failure
- Network overhead for model transfers
- Incompatible with ephemeral distributed machines
- Violates "no shared storage" constraint
Success Metrics
Performance Metrics
| Metric | Baseline | Target | Measurement |
|---|---|---|---|
| Cache Hit Rate | 0% (no cache) | 60% within 2 weeks | Redis event logs |
| Job Start Latency (cache hit) | 5 min | <10 seconds | OpenTelemetry traces |
| Job Start Latency (cache miss) | 5 min | 5 min (unchanged) | OpenTelemetry traces |
| Affinity Score Distribution | N/A | 70% jobs score >0 | Redis function logs |
| Cache Eviction Rate | N/A | <10% of downloads | model_cache.db analytics |
User Experience Metrics
| Metric | Baseline | Target | Measurement |
|---|---|---|---|
| User LoRA Uploads | 0 | 100+ LoRAs in 4 weeks | flat_file table count |
| Workflows Using User LoRAs | 0 | 50+ workflows | Job requirements analysis |
| User-Reported Wait Times | High complaints | <5 complaints/week | Support tickets |
System Health Metrics
| Metric | Target | Measurement |
|---|---|---|
| Disk Space Utilization | 70-90% of 50GB | Machine metrics |
| Cache Thrashing Rate | <5% | Eviction immediately followed by re-download |
| Azure Blob Egress | <100GB/day | Azure billing dashboard |
Implementation Phases
Phase 1: User Storage Infrastructure (Week 1-2)
Goal: Enable users to upload and store LoRAs
Tasks:
- [ ] Create API endpoints for LoRA upload/list/delete
- [ ] Implement Azure Blob Storage integration for user LoRAs
- [ ] Add
tags=['lora']filtering to flat_file queries - [ ] Update EmProps Studio UI for LoRA management
- [ ] Add validation for LoRA file format (.safetensors)
- [ ] Implement quota limits per user (e.g., 10 LoRAs, 10GB total)
Deliverables:
- Working API endpoints with Azure upload
- UI for uploading and managing LoRAs
- Documentation for LoRA upload process
Testing:
# Upload LoRA
curl -X POST http://localhost:3001/api/user-loras/upload \
-F "file=@my_lora.safetensors" \
-H "Authorization: Bearer $TOKEN"
# List user LoRAs
curl http://localhost:3001/api/user-loras?user_id=$USER_IDPhase 2: Just-in-Time Downloads (Week 2-3)
Goal: Workers download user LoRAs on demand
Tasks:
- [ ] Modify
EmProps_Lora_Loaderto supportflat_file_idparameter - [ ] Implement
_fetch_flat_file_metadata()API call - [ ] Add user LoRA directory structure (
/models/loras/user/) - [ ] Update model registration to distinguish user vs. shared LoRAs
- [ ] Add error handling for failed downloads
- [ ] Implement retry logic with exponential backoff
Deliverables:
- Working LoRA download from flat_file references
- Model cache tracking for user LoRAs
- Error recovery for failed downloads
Testing:
# In ComfyUI workflow JSON
{
"inputs": {
"lora_name": "", # Empty for user LoRAs
"flat_file_id": "550e8400-e29b-41d4-a716-446655440000",
"strength_model": 1.0,
"strength_clip": 1.0
}
}Phase 3: Cache Management (Week 3-4)
Goal: Implement LRU + time-based eviction
Tasks:
- [ ] Configure 50GB reserved space setting in model_cache.db
- [ ] Implement pre-download space check
- [ ] Add LRU eviction when cache fills
- [ ] Create background job for 7-day TTL cleanup
- [ ] Add cache metrics collection
- [ ] Implement graceful handling of eviction during job execution
Deliverables:
- Automated cache eviction system
- Background cleanup job (cron or PM2 scheduled task)
- Cache metrics dashboard in monitoring UI
Testing:
# Fill cache to capacity
for i in {1..50}; do
# Upload and download LoRAs to fill 50GB
done
# Verify LRU eviction
sqlite3 model_cache.db "SELECT * FROM models ORDER BY last_used ASC LIMIT 10"Phase 4: Affinity Routing (Week 4-5)
Goal: Prefer workers with cached LoRAs
Tasks:
- [ ] Add
cached_lorasfield toWorkerCapabilitiesinterface - [ ] Implement
getCachedLoRAs()in worker client - [ ] Modify Redis Lua function with scoring algorithm
- [ ] Add affinity score to job claim events
- [ ] Update monitoring UI to show affinity scores
- [ ] Add Redis logs for affinity debugging
Deliverables:
- Working affinity-based job routing
- Affinity score in monitoring dashboard
- Debug tooling for score analysis
Testing:
# Create job requiring specific LoRA
curl -X POST http://localhost:3001/api/jobs \
-d '{
"service_required": "comfyui",
"requirements": {
"loras": [{
"type": "user",
"flat_file_id": "550e8400-e29b-41d4-a716-446655440000"
}]
}
}'
# Verify worker with cached LoRA claims job
# Check Redis logs for affinity score: "score: 10"Phase 5: Monitoring & Analytics (Week 5-6)
Goal: Observe system behavior and optimize
Tasks:
- [ ] Add cache hit rate metrics to OpenTelemetry
- [ ] Create Dash0 dashboard for LoRA analytics
- [ ] Track affinity score distribution
- [ ] Monitor cache eviction patterns
- [ ] Analyze user LoRA upload trends
- [ ] Identify optimization opportunities
Deliverables:
- Comprehensive LoRA analytics dashboard
- Weekly report on cache performance
- Recommendations for Phase 2+ optimizations
Metrics Dashboard:
- Cache hit rate over time
- Affinity score distribution histogram
- Top 10 most popular LoRAs
- Cache eviction frequency
- Average job start latency by cache state
Phase 6: Optimization & Polish (Week 6+)
Goal: Refine based on production data
Tasks:
- [ ] Tune affinity scoring weights based on metrics
- [ ] Implement pre-warming for popular LoRAs
- [ ] Optimize cache eviction algorithm
- [ ] Add user notifications for evicted LoRAs
- [ ] Document best practices for LoRA usage
- [ ] Create troubleshooting guides
Deliverables:
- Production-ready LoRA support
- User documentation and guides
- Internal runbooks for operations team
Open Questions
LoRA Quota Management
- What limits should we enforce? (10 LoRAs/user? 10GB total?)
- How do we handle quota violations?
Cache Warming Strategy
- Should we pre-download popular LoRAs on machine startup?
- How do we identify "popular" LoRAs for warming?
Multi-LoRA Jobs
- How do we score jobs requiring 3+ LoRAs?
- Should we use sum of scores or weighted average?
Cross-Pool Behavior
- How does affinity routing interact with pool separation (Fast Lane / Standard / Heavy)?
- Should LoRA cache be pool-specific?
Azure Costs
- What is acceptable egress cost per month?
- Should we implement CDN or edge caching?
Related Documentation
- LoRA User Storage Support Report - Detailed investigation and technical analysis
- CLAUDE.md - North star architecture and model management strategy
- Environment Management Guide - Configuration system
- Testing Procedures - Standard testing procedures
Approval
Decision Makers:
- [ ] Engineering Lead
- [ ] Product Manager
- [ ] DevOps Team
Next Steps:
- Review ADR with team
- Gather feedback on open questions
- Approve or request revisions
- Begin Phase 1 implementation
Questions? Post in #architecture or #job-broker Slack channels.
