Skip to content

ADR-010: LoRA User Storage and Affinity Routing

Date: 2025-11-14 Status: 🤔 Proposed Decision Makers: Engineering Team Related Systems: Job Broker, ComfyUI Workers, Model Cache, Azure Blob Storage


Executive Summary

Enable users to upload and store custom LoRA models in their personal Azure Blob Storage, with intelligent just-in-time downloading and cache-aware job routing. This ADR combines user storage infrastructure with affinity-based job claiming to minimize model download times and improve job execution performance.

Key Capabilities:

  • User-owned LoRA storage in Azure Blob Storage
  • Just-in-time LoRA downloads on worker machines
  • Intelligent LRU + time-based cache eviction (50GB reserved, 7-day TTL)
  • Affinity-based job routing (prefer workers with cached LoRAs)
  • Non-blocking fallback to workers without cached models

North Star Alignment:

  • ✅ Supports predictive model management (Phase 2 goal)
  • ✅ Eliminates first-user wait times for popular LoRAs
  • ✅ Advances toward specialized machine pools
  • ✅ Improves job execution performance through cache-aware routing

Table of Contents

  1. Context
  2. Decision
  3. Architecture Design
  4. Implementation Specification
  5. Consequences
  6. Alternatives Considered
  7. Success Metrics
  8. Implementation Phases

Context

Current State

Existing LoRA Infrastructure:

  • EmProps_Lora_Loader custom node with Azure/AWS/GCS support
  • SQLite model_cache.db tracking model usage with LRU eviction
  • is_ignore flag preventing eviction of system LoRAs
  • Azure Blob Storage handlers for cloud downloads

Current Limitations:

  1. No user-owned LoRA storage capability
  2. All workers download LoRAs independently (no affinity routing)
  3. Redis job matching uses FIFO order (ignores cache state)
  4. First-user wait times for popular LoRAs (2-5 minutes)

Problem Statement

User Storage Problem: Users cannot upload and manage their own LoRA models. Current system only supports shared/system LoRAs baked into containers or downloaded from shared storage.

Performance Problem: When 3 workers can claim a job requiring a LoRA:

  • Worker A has the LoRA cached (ready in <1 second)
  • Worker B doesn't have it cached (5 minute download)
  • Worker C doesn't have it cached (5 minute download)

Current FIFO matching might assign to Worker B or C, causing unnecessary wait times.

Infrastructure Constraints:

  • Ephemeral machines with no shared storage
  • 50GB reserved for LoRA cache per machine
  • Need to balance cache utilization vs. disk space
  • Must work with existing flat_file table for user assets

Decision

We will implement a two-tier LoRA storage system with affinity-based job routing:

Part 1: User Storage Infrastructure

Storage Architecture:

  • Use existing flat_file table for user LoRA metadata
  • Store LoRA files in Azure Blob Storage (user-loras container)
  • Tag flat_file entries with tags=['lora'] for identification
  • Leverage existing Azure handlers and model cache database

Cache Management:

  • Reserve 50GB per machine for LoRA cache
  • LRU eviction when cache fills (existing mechanism)
  • Time-based cleanup after 7 days of inactivity
  • Preserve system LoRAs via is_ignore=true flag

Part 2: Affinity-Based Job Routing

Scoring Algorithm:

  • Workers report cached LoRAs in capabilities
  • Redis function scores each worker-job match
  • Higher score = better match (prefer cached models)
  • Non-blocking: workers without cache can still claim jobs

Scoring Rules:

lua
-- Scoring weights (configurable)
USER_LORA_MATCH_SCORE = 10   -- User's custom LoRA already cached
SHARED_LORA_MATCH_SCORE = 5  -- Shared LoRA already cached
BASE_SCORE = 0               -- No cache match, download required

Architecture Design

High-Level Architecture

┌─────────────────┐
│   User Uploads  │
│   LoRA via API  │
└────────┬────────┘

         v
┌─────────────────────────────────────────────────┐
│         flat_file Table (PostgreSQL)            │
│  ┌─────────────────────────────────────────┐   │
│  │ id: uuid                                │   │
│  │ user_id: uuid                           │   │
│  │ url: https://emprops.blob...            │   │
│  │ tags: ['lora']                          │   │
│  │ metadata: {original_name, size_bytes}   │   │
│  └─────────────────────────────────────────┘   │
└─────────────────────────────────────────────────┘

                     │ Referenced in workflow
                     v
         ┌───────────────────────┐
         │   Job Requirements    │
         │  {loras: [{           │
         │    type: 'user',      │
         │    flat_file_id: uuid │
         │  }]}                  │
         └───────────┬───────────┘

                     v
         ┌─────────────────────────────────────┐
         │   Redis: findMatchingJob()          │
         │                                     │
         │  1. Extract LoRA requirements       │
         │  2. Score each worker:              │
         │     - User LoRA cached: +10 points  │
         │     - Shared LoRA cached: +5 points │
         │  3. Return highest scoring match    │
         └───────────┬─────────────────────────┘

                     v
         ┌─────────────────────────┐
         │  Worker Claims Job      │
         └───────────┬─────────────┘

                     v
         ┌─────────────────────────────────────┐
         │  EmProps_Lora_Loader Node           │
         │                                     │
         │  1. Check local cache               │
         │  2. If not cached:                  │
         │     - Get flat_file metadata        │
         │     - Download from Azure           │
         │     - Register in model_cache.db    │
         │  3. Load LoRA into ComfyUI          │
         └─────────────────────────────────────┘

                     v
         ┌─────────────────────────┐
         │  model_cache.db         │
         │  (SQLite)               │
         │                         │
         │  - Track usage          │
         │  - LRU eviction         │
         │  - 7-day TTL cleanup    │
         │  - is_ignore protection │
         └─────────────────────────┘

Data Flow

Upload Flow

User → API → flat_file table → Azure Blob Storage

                 └─> Tags: ['lora']
                     Metadata: {original_name, size_bytes}

Job Execution Flow

1. Worker calls Redis: FCALL findMatchingJob({
     worker_id: "worker-123",
     cached_loras: {
       user_loras: ["uuid-1", "uuid-2"],    // flat_file IDs
       shared_loras: ["model-a.safetensors"] // filenames
     }
   })

2. Redis scores matches:
   Job requires: uuid-1 (user LoRA)

   Worker A: Has uuid-1 cached → Score: 10 ✅
   Worker B: No uuid-1 → Score: 0
   Worker C: No uuid-1 → Score: 0

   → Worker A claims job

3. Worker executes:
   - EmProps_Lora_Loader checks cache
   - LoRA already present → immediate load
   - Job starts in <1 second

Download Flow (Cache Miss)

1. EmProps_Lora_Loader: LoRA not in cache
2. Query flat_file table for metadata
3. Download from Azure Blob Storage
4. Register in model_cache.db (is_ignore=false)
5. Load LoRA into ComfyUI
6. Future jobs on this worker get Score: +10

Implementation Specification

1. Database Schema

No schema changes required - use existing flat_file table:

sql
-- Existing flat_file table (no changes needed)
CREATE TABLE flat_file (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  url TEXT NOT NULL,
  tags TEXT[] DEFAULT '{}',
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Query user LoRAs
SELECT * FROM flat_file
WHERE user_id = $1
  AND 'lora' = ANY(tags);

2. API Endpoints

New Endpoints:

typescript
// Upload user LoRA
POST /api/user-loras/upload
Body: multipart/form-data { file: File }
Response: {
  flat_file_id: string;
  url: string;
  size_bytes: number;
  original_name: string;
}

// List user LoRAs
GET /api/user-loras
Query: { user_id: string }
Response: {
  loras: Array<{
    id: string;
    original_name: string;
    size_bytes: number;
    created_at: string;
  }>
}

// Delete user LoRA
DELETE /api/user-loras/:id
Response: { success: boolean }

3. Worker Changes

File: apps/worker/src/redis-direct-worker-client.ts

typescript
import { ModelCacheDB } from '@emp/comfyui-custom-nodes/db/model_cache';

// New helper function
async function getCachedLoRAs(): Promise<{
  user_loras: string[];
  shared_loras: string[];
}> {
  const model_cache_db = new ModelCacheDB();
  const models = await model_cache_db.get_all_models();

  // Extract cached LoRAs
  const user_loras = models
    .filter(m => m.model_type === 'user_lora')
    .map(m => extractFlatFileIdFromPath(m.path));

  const shared_loras = models
    .filter(m => m.model_type === 'lora' && m.is_ignore === false)
    .map(m => m.filename);

  return { user_loras, shared_loras };
}

// Helper to extract flat_file ID from path
function extractFlatFileIdFromPath(path: string): string {
  // Path format: /workspace/ComfyUI/models/loras/user/{flat_file_id}.safetensors
  const match = path.match(/user\/([a-f0-9-]+)\.safetensors$/);
  return match ? match[1] : '';
}

// Modified: callFindMatchingJob (around line 499)
async callFindMatchingJob(capabilities: WorkerCapabilities) {
  // Add cached LoRAs to capabilities
  const cached_loras = await getCachedLoRAs();

  const enhancedCapabilities = {
    ...capabilities,
    cached_loras
  };

  const result = await this.redis.call(
    'FCALL',
    'findMatchingJob',
    0,
    JSON.stringify(enhancedCapabilities),
    '100' // max_scan
  );

  return result;
}

4. Type Changes

File: packages/core/src/types/worker.ts

typescript
export interface WorkerCapabilities {
  worker_id: string;
  job_service_required_map: string[];
  hardware?: {
    gpu_memory_gb?: number;
    cpu_cores?: number;
    ram_gb?: number;
  };
  models?: Record<string, string[]>;
  customer_access?: {
    isolation?: 'strict' | 'shared';
    allowed_customers?: string[];
    denied_customers?: string[];
  };
  workflow_id?: string;

  // NEW: Cached LoRA tracking
  cached_loras?: {
    user_loras?: string[];    // Array of flat_file IDs
    shared_loras?: string[];  // Array of filenames
  };
}

5. Redis Lua Function Changes

File: packages/core/src/redis-functions/functions/findMatchingJob.lua

lua
-- NEW: Calculate affinity score based on cached LoRAs
local function calculate_affinity_score(worker, job)
  local score = 0

  -- Parse job requirements to extract LoRA requirements
  local requirements = {}
  if job.requirements and job.requirements ~= '' then
    local success, parsed = pcall(cjson.decode, job.requirements)
    if success then
      requirements = parsed
    end
  end

  -- Check if job requires any LoRAs
  local required_loras = requirements.loras or {}
  if #required_loras == 0 then
    return 0  -- No LoRAs required, no affinity bonus
  end

  -- Get worker's cached LoRAs
  local cached_loras = worker.cached_loras or {}
  local user_loras = cached_loras.user_loras or {}
  local shared_loras = cached_loras.shared_loras or {}

  -- Score each required LoRA
  for _, required_lora in ipairs(required_loras) do
    if required_lora.type == 'user' then
      -- Check if worker has this user LoRA cached
      for _, cached_id in ipairs(user_loras) do
        if cached_id == required_lora.flat_file_id then
          score = score + 10  -- USER_LORA_MATCH_SCORE
          break
        end
      end
    elseif required_lora.type == 'shared' then
      -- Check if worker has this shared LoRA cached
      for _, cached_name in ipairs(shared_loras) do
        if cached_name == required_lora.filename then
          score = score + 5  -- SHARED_LORA_MATCH_SCORE
          break
        end
      end
    end
  end

  return score
end

-- MODIFIED: Main function (line 340)
redis.register_function('findMatchingJob', function(keys, args)
  local worker_caps_json = args[1]
  local max_scan = tonumber(args[2]) or 100

  local worker_caps = cjson.decode(worker_caps_json)

  redis.log(redis.LOG_NOTICE, 'Worker ' .. worker_caps.worker_id .. ' requesting job')

  local pending_jobs = redis.call('ZREVRANGE', 'jobs:pending', '0', tostring(max_scan - 1))

  if not pending_jobs or #pending_jobs == 0 then
    return nil
  end

  -- NEW: Track best match
  local best_job_id = nil
  local best_score = -1
  local best_job = nil

  -- Check each job for compatibility AND affinity
  for i = 1, #pending_jobs do
    local job_id = pending_jobs[i]
    local job_data = redis.call('HGETALL', 'job:' .. job_id)

    if job_data and #job_data > 0 then
      local job = hash_to_table(job_data)

      -- Check if worker meets requirements
      if matches_requirements(worker_caps, job) then
        -- Calculate affinity score
        local affinity_score = calculate_affinity_score(worker_caps, job)

        redis.log(redis.LOG_DEBUG, 'Job ' .. job_id .. ' affinity score: ' .. affinity_score)

        -- Track best match
        if affinity_score > best_score then
          best_score = affinity_score
          best_job_id = job_id
          best_job = job
        end
      end
    end
  end

  -- If we found a match, claim the best one
  if best_job_id then
    local removed = redis.call('ZREM', 'jobs:pending', best_job_id)
    if removed == 1 then
      -- Update job status (existing code)
      redis.call('HMSET', 'job:' .. best_job_id,
        'status', 'assigned',
        'worker_id', worker_caps.worker_id,
        'assigned_at', get_iso_timestamp()
      )

      -- Add to worker's active jobs
      redis.call('HSET', 'jobs:active:' .. worker_caps.worker_id, best_job_id, cjson.encode(best_job))

      -- Update worker status
      redis.call('HMSET', 'worker:' .. worker_caps.worker_id,
        'status', 'busy',
        'current_job_id', best_job_id,
        'last_status_change', get_iso_timestamp()
      )

      -- Publish events (existing code)
      redis.call('PUBLISH', 'job_claimed', cjson.encode({
        job_id = best_job_id,
        worker_id = worker_caps.worker_id,
        status = 'claimed',
        affinity_score = best_score,  -- NEW: Include score in event
        timestamp = tonumber(redis.call('TIME')[1]) * 1000
      }))

      redis.log(redis.LOG_NOTICE, 'Worker ' .. worker_caps.worker_id .. ' claimed job ' .. best_job_id .. ' (score: ' .. best_score .. ')')

      return cjson.encode({
        jobId = best_job_id,
        job = best_job
      })
    end
  end

  return nil
end)

6. ComfyUI Loader Changes

File: packages/comfyui-custom-nodes/emprops_comfy_nodes/nodes/emprops_lora_loader.py

python
# NEW: Support for user LoRAs
def download_from_cloud(self, lora_name, provider=None, bucket=None, flat_file_id=None):
    """Download LoRA from cloud storage if not found locally"""

    # NEW: Handle user LoRA downloads
    if flat_file_id:
        # User LoRA - download to user-specific directory
        local_path = self._get_user_lora_path(flat_file_id)

        # Check cache first
        if os.path.exists(local_path):
            print(f"[EmProps] User LoRA already cached: {local_path}")
            model_cache_db.update_model_usage(local_path)
            return local_path

        # Fetch flat_file metadata from API
        metadata = self._fetch_flat_file_metadata(flat_file_id)
        if not metadata:
            print(f"[EmProps] Failed to fetch metadata for flat_file: {flat_file_id}")
            return None

        # Download from Azure Blob Storage
        handler = AzureHandler(container_name='user-loras')
        success, error = handler.download_file(
            blob_path=metadata['blob_path'],
            local_path=local_path
        )

        if not success:
            print(f"[EmProps] Failed to download user LoRA: {error}")
            return None

        # Register in model cache (is_ignore=False for user LoRAs)
        model_cache_db.register_model(
            path=local_path,
            model_type='user_lora',
            size_bytes=os.path.getsize(local_path),
            is_ignore=False  # User LoRAs can be evicted
        )

        return local_path

    # Existing shared LoRA logic
    # ... (unchanged)

def _get_user_lora_path(self, flat_file_id):
    """Get local path for user LoRA"""
    lora_paths = folder_paths.folder_names_and_paths["loras"][0]
    user_dir = os.path.join(lora_paths[0], 'user')
    os.makedirs(user_dir, exist_ok=True)
    return os.path.join(user_dir, f"{flat_file_id}.safetensors")

def _fetch_flat_file_metadata(self, flat_file_id):
    """Fetch flat_file metadata from API"""
    api_url = os.getenv('EMP_API_URL', 'http://localhost:3001')
    response = requests.get(f"{api_url}/api/flat-files/{flat_file_id}")
    if response.status_code == 200:
        return response.json()
    return None

Consequences

Positive Consequences ✅

  1. User Empowerment

    • Users can upload and manage custom LoRAs
    • No dependency on shared model repositories
    • Full control over model lifecycle
  2. Performance Optimization

    • Jobs route to workers with cached models
    • Eliminates redundant downloads across workers
    • Reduces job start latency by 2-5 minutes (when cache hits)
  3. Cache Efficiency

    • Popular LoRAs naturally accumulate high scores
    • LRU + time-based eviction keeps cache fresh
    • 50GB dedicated space prevents disk exhaustion
  4. Non-Blocking Design

    • Workers without cache can still claim jobs
    • System degrades gracefully under load
    • No hard dependencies on cache state
  5. North Star Alignment

    • Advances Phase 2: Model Intelligence goals
    • Supports predictive model placement strategy
    • Foundation for pool-specific model baking

Negative Consequences ❌

  1. Complexity Increase

    • Redis function logic becomes more sophisticated
    • Additional API endpoints for LoRA management
    • Worker needs to query cache state before claiming
  2. Storage Costs

    • Azure Blob Storage costs for user LoRAs
    • 50GB per machine reserved for cache
    • Potential for cache thrashing with diverse workloads
  3. Cold Start Problem

    • First user of a LoRA still experiences download wait
    • Cache warmup period before affinity benefits appear
    • May need pre-warming strategies for popular models
  4. Monitoring Complexity

    • Need to track cache hit rates per LoRA
    • Affinity score distribution monitoring
    • Cache eviction analytics required

Alternatives Considered

Alternative 1: Priority Queue Approach

Design:

  • Two-pass matching: first try workers with cache, then fallback
  • Maintain separate priority queues for jobs

Rejected Because:

  • More complex than scoring approach
  • Harder to extend for multi-LoRA jobs
  • Doesn't handle partial cache matches well

Alternative 2: S3/Shared Storage

Design:

  • Use AWS S3 instead of Azure
  • Shared storage accessible by all workers

Rejected Because:

  • EmProps uses Azure, not AWS
  • Vendor lock-in concerns
  • Azure Blob Storage already integrated

Alternative 3: Pre-baked Container Images

Design:

  • Bake popular LoRAs into container images
  • Different images for different LoRA sets

Rejected Because:

  • Not feasible for user-owned LoRAs
  • Container images become massive (50GB+)
  • Deployment time increases dramatically
  • Note: May complement this approach in Phase 2

Alternative 4: Centralized Cache Service

Design:

  • Single cache service shared across workers
  • Workers fetch from cache service, not Azure

Rejected Because:

  • Single point of failure
  • Network overhead for model transfers
  • Incompatible with ephemeral distributed machines
  • Violates "no shared storage" constraint

Success Metrics

Performance Metrics

MetricBaselineTargetMeasurement
Cache Hit Rate0% (no cache)60% within 2 weeksRedis event logs
Job Start Latency (cache hit)5 min<10 secondsOpenTelemetry traces
Job Start Latency (cache miss)5 min5 min (unchanged)OpenTelemetry traces
Affinity Score DistributionN/A70% jobs score >0Redis function logs
Cache Eviction RateN/A<10% of downloadsmodel_cache.db analytics

User Experience Metrics

MetricBaselineTargetMeasurement
User LoRA Uploads0100+ LoRAs in 4 weeksflat_file table count
Workflows Using User LoRAs050+ workflowsJob requirements analysis
User-Reported Wait TimesHigh complaints<5 complaints/weekSupport tickets

System Health Metrics

MetricTargetMeasurement
Disk Space Utilization70-90% of 50GBMachine metrics
Cache Thrashing Rate<5%Eviction immediately followed by re-download
Azure Blob Egress<100GB/dayAzure billing dashboard

Implementation Phases

Phase 1: User Storage Infrastructure (Week 1-2)

Goal: Enable users to upload and store LoRAs

Tasks:

  • [ ] Create API endpoints for LoRA upload/list/delete
  • [ ] Implement Azure Blob Storage integration for user LoRAs
  • [ ] Add tags=['lora'] filtering to flat_file queries
  • [ ] Update EmProps Studio UI for LoRA management
  • [ ] Add validation for LoRA file format (.safetensors)
  • [ ] Implement quota limits per user (e.g., 10 LoRAs, 10GB total)

Deliverables:

  • Working API endpoints with Azure upload
  • UI for uploading and managing LoRAs
  • Documentation for LoRA upload process

Testing:

bash
# Upload LoRA
curl -X POST http://localhost:3001/api/user-loras/upload \
  -F "file=@my_lora.safetensors" \
  -H "Authorization: Bearer $TOKEN"

# List user LoRAs
curl http://localhost:3001/api/user-loras?user_id=$USER_ID

Phase 2: Just-in-Time Downloads (Week 2-3)

Goal: Workers download user LoRAs on demand

Tasks:

  • [ ] Modify EmProps_Lora_Loader to support flat_file_id parameter
  • [ ] Implement _fetch_flat_file_metadata() API call
  • [ ] Add user LoRA directory structure (/models/loras/user/)
  • [ ] Update model registration to distinguish user vs. shared LoRAs
  • [ ] Add error handling for failed downloads
  • [ ] Implement retry logic with exponential backoff

Deliverables:

  • Working LoRA download from flat_file references
  • Model cache tracking for user LoRAs
  • Error recovery for failed downloads

Testing:

python
# In ComfyUI workflow JSON
{
  "inputs": {
    "lora_name": "",  # Empty for user LoRAs
    "flat_file_id": "550e8400-e29b-41d4-a716-446655440000",
    "strength_model": 1.0,
    "strength_clip": 1.0
  }
}

Phase 3: Cache Management (Week 3-4)

Goal: Implement LRU + time-based eviction

Tasks:

  • [ ] Configure 50GB reserved space setting in model_cache.db
  • [ ] Implement pre-download space check
  • [ ] Add LRU eviction when cache fills
  • [ ] Create background job for 7-day TTL cleanup
  • [ ] Add cache metrics collection
  • [ ] Implement graceful handling of eviction during job execution

Deliverables:

  • Automated cache eviction system
  • Background cleanup job (cron or PM2 scheduled task)
  • Cache metrics dashboard in monitoring UI

Testing:

bash
# Fill cache to capacity
for i in {1..50}; do
  # Upload and download LoRAs to fill 50GB
done

# Verify LRU eviction
sqlite3 model_cache.db "SELECT * FROM models ORDER BY last_used ASC LIMIT 10"

Phase 4: Affinity Routing (Week 4-5)

Goal: Prefer workers with cached LoRAs

Tasks:

  • [ ] Add cached_loras field to WorkerCapabilities interface
  • [ ] Implement getCachedLoRAs() in worker client
  • [ ] Modify Redis Lua function with scoring algorithm
  • [ ] Add affinity score to job claim events
  • [ ] Update monitoring UI to show affinity scores
  • [ ] Add Redis logs for affinity debugging

Deliverables:

  • Working affinity-based job routing
  • Affinity score in monitoring dashboard
  • Debug tooling for score analysis

Testing:

bash
# Create job requiring specific LoRA
curl -X POST http://localhost:3001/api/jobs \
  -d '{
    "service_required": "comfyui",
    "requirements": {
      "loras": [{
        "type": "user",
        "flat_file_id": "550e8400-e29b-41d4-a716-446655440000"
      }]
    }
  }'

# Verify worker with cached LoRA claims job
# Check Redis logs for affinity score: "score: 10"

Phase 5: Monitoring & Analytics (Week 5-6)

Goal: Observe system behavior and optimize

Tasks:

  • [ ] Add cache hit rate metrics to OpenTelemetry
  • [ ] Create Dash0 dashboard for LoRA analytics
  • [ ] Track affinity score distribution
  • [ ] Monitor cache eviction patterns
  • [ ] Analyze user LoRA upload trends
  • [ ] Identify optimization opportunities

Deliverables:

  • Comprehensive LoRA analytics dashboard
  • Weekly report on cache performance
  • Recommendations for Phase 2+ optimizations

Metrics Dashboard:

  • Cache hit rate over time
  • Affinity score distribution histogram
  • Top 10 most popular LoRAs
  • Cache eviction frequency
  • Average job start latency by cache state

Phase 6: Optimization & Polish (Week 6+)

Goal: Refine based on production data

Tasks:

  • [ ] Tune affinity scoring weights based on metrics
  • [ ] Implement pre-warming for popular LoRAs
  • [ ] Optimize cache eviction algorithm
  • [ ] Add user notifications for evicted LoRAs
  • [ ] Document best practices for LoRA usage
  • [ ] Create troubleshooting guides

Deliverables:

  • Production-ready LoRA support
  • User documentation and guides
  • Internal runbooks for operations team

Open Questions

  1. LoRA Quota Management

    • What limits should we enforce? (10 LoRAs/user? 10GB total?)
    • How do we handle quota violations?
  2. Cache Warming Strategy

    • Should we pre-download popular LoRAs on machine startup?
    • How do we identify "popular" LoRAs for warming?
  3. Multi-LoRA Jobs

    • How do we score jobs requiring 3+ LoRAs?
    • Should we use sum of scores or weighted average?
  4. Cross-Pool Behavior

    • How does affinity routing interact with pool separation (Fast Lane / Standard / Heavy)?
    • Should LoRA cache be pool-specific?
  5. Azure Costs

    • What is acceptable egress cost per month?
    • Should we implement CDN or edge caching?


Approval

Decision Makers:

  • [ ] Engineering Lead
  • [ ] Product Manager
  • [ ] DevOps Team

Next Steps:

  1. Review ADR with team
  2. Gather feedback on open questions
  3. Approve or request revisions
  4. Begin Phase 1 implementation

Questions? Post in #architecture or #job-broker Slack channels.

Released under the MIT License.