Tool Intermediate medium · 8 min concept

Online vs offline feature serving: the two-store pattern

What you will learn

Use separate storage backends for batch feature computation (offline) and low-latency inference (online) to balance speed, cost, and freshness.

Why this matters

Production ML systems fail when features take 500ms to fetch but your inference SLA is 100ms. The two-store pattern separates batch-computed features (S3/DVC) from real-time served features (Redis/PostgreSQL), preventing latency bottlenecks that degrade user experience and increase infrastructure costs.

Skip if: Skip the two-store pattern if your features are: (1) computed on-demand in <50ms, (2) so large that caching is infeasible (TB-scale embeddings), or (3) your use case allows 5+ minute feature staleness. Single-store architectures work for non-production experiments or low-QPS services.

Explanation

The two-store pattern separates feature computation from feature serving. Offline stores (S3, DVC, data warehouses) hold historical features computed in batch jobs: cheap, unlimited scale, but slow. Online stores (Redis, DynamoDB, PostgreSQL) hold a subset of features needed for real-time inference: fast, expensive at scale, strictly fresh. Your inference service queries the online store (milliseconds); your training pipeline reads from the offline store (minutes). DVC tracks offline feature versions; a feature serving layer (Feast, Tecton) syncs the fastest features to Redis on a schedule. This decoupling lets you compute 10,000 features offline overnight but serve only the 50 most-critical features at inference time. Without separation, you either overpay for always-hot storage or watch your API timeout waiting for feature joins.

Configuration

yaml

version: '3.8'
services:
  # Offline store: DVC-tracked parquet files on local filesystem
  # In production, this is S3 + DVC remote
  offline_store:
    image: minio/minio:latest
    ports:
      - "9000:9000"
      - "9001:9001"
    environment:
      MINIO_ROOT_USER: minioadmin
      MINIO_ROOT_PASSWORD: minioadmin
    command: server /data --console-address ":9001"
    volumes:
      - minio_data:/data

  # Online store: Redis for sub-millisecond feature lookups
  online_store:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data

  # PostgreSQL as alternative online store with full-text indexing
  postgres_online:
    image: postgres:15-alpine
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: features
      POSTGRES_USER: mlops
      POSTGRES_PASSWORD: password123
    volumes:
      - postgres_data:/var/lib/postgresql/data

  # Feature serving layer (syncs from offline to online)
  feature_server:
    build:
      context: .
      dockerfile: Dockerfile.featureserver
    ports:
      - "8000:8000"
    environment:
      OFFLINE_STORE_PATH: s3://ml-features/
      ONLINE_STORE_REDIS: redis://online_store:6379
      FEATURE_SYNC_INTERVAL: "300"
    depends_on:
      - offline_store
      - online_store
    volumes:
      - ./features:/app/features

volumes:
  minio_data:
  redis_data:
  postgres_data:

Why this order?

The offline_store (MinIO) must start first because feature_server depends on it. online_store (Redis) starts in parallel. postgres_online is optional but declared before feature_server to allow conditional mounting. Services with dependencies use 'depends_on' to enforce startup order and health-check waiting.

Wrong vs Right

Wrong way

yaml

version: '3.8'
services:
  monolithic_store:
    image: postgres:15-alpine
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: everything
    volumes:
      - ./data:/var/lib/postgresql/data

  ml_service:
    image: python:3.11-slim
    ports:
      - "8000:8000"
    command: python inference_server.py
    depends_on:
      - monolithic_store
    environment:
      DATABASE_URL: postgresql://user:pass@monolithic_store:5432/everything

Right way

yaml

# dvc.yaml: defines offline feature computation
stages:
  fetch_raw_data:
    cmd: python scripts/fetch_data.py
    deps:
      - scripts/fetch_data.py
    outs:
      - data/raw.parquet:
          cache: true
  
  compute_features_offline:
    cmd: python scripts/compute_features.py --input data/raw.parquet --output data/features.parquet
    deps:
      - scripts/compute_features.py
      - data/raw.parquet
    outs:
      - data/features.parquet:
          cache: true
  
  sync_to_online:
    cmd: python scripts/sync_to_redis.py --source data/features.parquet --redis redis://localhost:6379 --ttl 3600
    deps:
      - scripts/sync_to_redis.py
      - data/features.parquet
    params:
      - sync.batch_size
      - sync.ttl_seconds

# inference_server.py: queries online store only
import redis
import json
from fastapi import FastAPI

app = FastAPI()
online_store = redis.Redis(host='localhost', port=6379, decode_responses=True)

@app.post("/predict")
async def predict(user_id: str):
    # Sub-millisecond lookup from Redis
    features_json = online_store.get(f"user:{user_id}:features")
    if not features_json:
        return {"error": "user not found in online store"}
    features = json.loads(features_json)
    # ... model prediction ...
    return {"prediction": 0.95}

Tool vitals

Primary command

bash

dvc dag (to visualize feature pipeline) and docker-compose (to run feature serving infrastructure)

Config file dvc.yaml (defines feature computation stages), docker-compose.yml (defines online/offline storage)

Verify

bash

dvc dag && dvc repro (to execute the feature pipeline and sync to stores)

Integration notes

DVC manages offline feature storage and versioning (dvc remote add s3_features s3://bucket). MLflow logs feature schemas and pipeline run metadata. Redis or DynamoDB serves as the online store. Feast or a custom Python service (using this docker-compose.yml) syncs features between stores on a schedule. Your inference service (BentoML, vLLM) queries only the online store via Redis GET or PostgreSQL SELECT.

Migration path

To move away: (1) Start with a single hot PostgreSQL table (offline + online combined). (2) Once latency becomes unacceptable, spin up Redis as a read-through cache. (3) Separate batch jobs to populate offline S3, sync hot features to Redis. (4) Graduate to a managed feature platform (Feast, Tecton) if your feature count grows >1000. The pattern lets you scale incrementally without rewriting your inference service.

Cost model

Offline store (S3): $0.023/GB-month + $0.0004/10K GET requests. Online store (Redis): $0.015/GB-hour for in-memory (expensive at scale), or $0.25/hour for managed Redis (AWS ElastiCache). PostgreSQL online: $0.20/hour for db.t3.small + storage. MinIO in docker-compose: free, but production requires S3. Total for 100GB features, 1M daily inference requests: ~$150-400/month (Redis online) vs. $8,000+/month (single hot database). The two-store pattern saves 50-80% at scale.

Common gotcha

The two-store pattern silently breaks when your feature sync interval (TTL in online store) exceeds your inference SLA tolerance. A developer writes data to the offline store at 2am UTC, the sync job runs at 3am, but the online store expires features at 2:30am: your 3am inference request misses fresh features. Set FEATURE_SYNC_INTERVAL to half your TTL, or use event-driven sync (Kafka triggers) instead of cron. Without this, you'll see 'feature not found' errors that only reproduce at specific times: hard to debug in staging.

Team adoption

Week 1: Run docker-compose locally with MinIO + Redis. One engineer writes the dvc.yaml pipeline (fetch → compute → sync). Week 2: Push DVC remote to S3, update docker-compose to use real S3 + managed Redis. Week 3: Deploy feature_server as a K8s CronJob (see separate K8s item). By week 4, all inference services query only the online store: no more 500ms feature joins. Document the TTL sync interval prominently; make it a config parameter, not hardcoded. New team members will otherwise set TTL=24h and skip sync, causing silent staleness.

Experienced dev note

Use Redis ZSET (sorted sets) with timestamp scores, not plain strings. SET user:123:features value at time T, then ZRANGEBYSCORE user:*:features T-TTL T+infinity to auto-expire stale features without application-level staleness checks. This catches sync failures silently: your inference queries only 'fresh' features. Also: deploy feature_server as a Kubernetes CronJob that syncs from S3 to Redis every 5 minutes, not a long-running container. It's cheaper, survives node failures, and simplifies monitoring (a failed job is visible; a hanging service isn't).

Check your understanding

Your offline feature computation takes 4 hours. Your inference SLA is 100ms. Your Redis online store has a TTL of 2 hours. You sync from offline to online every 6 hours. What happens to the 2-hour window between the last sync (at hour 0) and the next sync (at hour 6), and how do you fix it?

Show answer hint

Features expire from Redis at hour 2, but new features don't arrive until hour 6: a 4-hour window with no features. Solution: (1) either compute offline more frequently (every 1-2 hours), (2) sync more frequently than TTL/2 (every hour), or (3) use event-driven sync where offline job triggers sync immediately upon completion. Cron-based sync with long computation time creates blind spots.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.