Online vs offline feature serving: the two-store pattern
Why this matters
Production ML systems fail when features take 500ms to fetch but your inference SLA is 100ms. The two-store pattern separates batch-computed features (S3/DVC) from real-time served features (Redis/PostgreSQL), preventing latency bottlenecks that degrade user experience and increase infrastructure costs.
Explanation
The two-store pattern separates feature computation from feature serving. Offline stores (S3, DVC, data warehouses) hold historical features computed in batch jobs: cheap, unlimited scale, but slow. Online stores (Redis, DynamoDB, PostgreSQL) hold a subset of features needed for real-time inference: fast, expensive at scale, strictly fresh. Your inference service queries the online store (milliseconds); your training pipeline reads from the offline store (minutes). DVC tracks offline feature versions; a feature serving layer (Feast, Tecton) syncs the fastest features to Redis on a schedule. This decoupling lets you compute 10,000 features offline overnight but serve only the 50 most-critical features at inference time. Without separation, you either overpay for always-hot storage or watch your API timeout waiting for feature joins.
Configuration
version: '3.8'
services:
# Offline store: DVC-tracked parquet files on local filesystem
# In production, this is S3 + DVC remote
offline_store:
image: minio/minio:latest
ports:
- "9000:9000"
- "9001:9001"
environment:
MINIO_ROOT_USER: minioadmin
MINIO_ROOT_PASSWORD: minioadmin
command: server /data --console-address ":9001"
volumes:
- minio_data:/data
# Online store: Redis for sub-millisecond feature lookups
online_store:
image: redis:7-alpine
ports:
- "6379:6379"
command: redis-server --appendonly yes
volumes:
- redis_data:/data
# PostgreSQL as alternative online store with full-text indexing
postgres_online:
image: postgres:15-alpine
ports:
- "5432:5432"
environment:
POSTGRES_DB: features
POSTGRES_USER: mlops
POSTGRES_PASSWORD: password123
volumes:
- postgres_data:/var/lib/postgresql/data
# Feature serving layer (syncs from offline to online)
feature_server:
build:
context: .
dockerfile: Dockerfile.featureserver
ports:
- "8000:8000"
environment:
OFFLINE_STORE_PATH: s3://ml-features/
ONLINE_STORE_REDIS: redis://online_store:6379
FEATURE_SYNC_INTERVAL: "300"
depends_on:
- offline_store
- online_store
volumes:
- ./features:/app/features
volumes:
minio_data:
redis_data:
postgres_data: Why this order?
The offline_store (MinIO) must start first because feature_server depends on it. online_store (Redis) starts in parallel. postgres_online is optional but declared before feature_server to allow conditional mounting. Services with dependencies use 'depends_on' to enforce startup order and health-check waiting.
Wrong vs Right
version: '3.8'
services:
monolithic_store:
image: postgres:15-alpine
ports:
- "5432:5432"
environment:
POSTGRES_DB: everything
volumes:
- ./data:/var/lib/postgresql/data
ml_service:
image: python:3.11-slim
ports:
- "8000:8000"
command: python inference_server.py
depends_on:
- monolithic_store
environment:
DATABASE_URL: postgresql://user:pass@monolithic_store:5432/everything # dvc.yaml: defines offline feature computation
stages:
fetch_raw_data:
cmd: python scripts/fetch_data.py
deps:
- scripts/fetch_data.py
outs:
- data/raw.parquet:
cache: true
compute_features_offline:
cmd: python scripts/compute_features.py --input data/raw.parquet --output data/features.parquet
deps:
- scripts/compute_features.py
- data/raw.parquet
outs:
- data/features.parquet:
cache: true
sync_to_online:
cmd: python scripts/sync_to_redis.py --source data/features.parquet --redis redis://localhost:6379 --ttl 3600
deps:
- scripts/sync_to_redis.py
- data/features.parquet
params:
- sync.batch_size
- sync.ttl_seconds
# inference_server.py: queries online store only
import redis
import json
from fastapi import FastAPI
app = FastAPI()
online_store = redis.Redis(host='localhost', port=6379, decode_responses=True)
@app.post("/predict")
async def predict(user_id: str):
# Sub-millisecond lookup from Redis
features_json = online_store.get(f"user:{user_id}:features")
if not features_json:
return {"error": "user not found in online store"}
features = json.loads(features_json)
# ... model prediction ...
return {"prediction": 0.95} Tool vitals
dvc dag (to visualize feature pipeline) and docker-compose (to run feature serving infrastructure) dvc.yaml (defines feature computation stages), docker-compose.yml (defines online/offline storage) dvc dag && dvc repro (to execute the feature pipeline and sync to stores) Integration notes
DVC manages offline feature storage and versioning (dvc remote add s3_features s3://bucket). MLflow logs feature schemas and pipeline run metadata. Redis or DynamoDB serves as the online store. Feast or a custom Python service (using this docker-compose.yml) syncs features between stores on a schedule. Your inference service (BentoML, vLLM) queries only the online store via Redis GET or PostgreSQL SELECT.
Migration path
To move away: (1) Start with a single hot PostgreSQL table (offline + online combined). (2) Once latency becomes unacceptable, spin up Redis as a read-through cache. (3) Separate batch jobs to populate offline S3, sync hot features to Redis. (4) Graduate to a managed feature platform (Feast, Tecton) if your feature count grows >1000. The pattern lets you scale incrementally without rewriting your inference service.
Cost model
Offline store (S3): $0.023/GB-month + $0.0004/10K GET requests. Online store (Redis): $0.015/GB-hour for in-memory (expensive at scale), or $0.25/hour for managed Redis (AWS ElastiCache). PostgreSQL online: $0.20/hour for db.t3.small + storage. MinIO in docker-compose: free, but production requires S3. Total for 100GB features, 1M daily inference requests: ~$150-400/month (Redis online) vs. $8,000+/month (single hot database). The two-store pattern saves 50-80% at scale.
Common gotcha
The two-store pattern silently breaks when your feature sync interval (TTL in online store) exceeds your inference SLA tolerance. A developer writes data to the offline store at 2am UTC, the sync job runs at 3am, but the online store expires features at 2:30am: your 3am inference request misses fresh features. Set FEATURE_SYNC_INTERVAL to half your TTL, or use event-driven sync (Kafka triggers) instead of cron. Without this, you'll see 'feature not found' errors that only reproduce at specific times: hard to debug in staging.
Team adoption
Week 1: Run docker-compose locally with MinIO + Redis. One engineer writes the dvc.yaml pipeline (fetch → compute → sync). Week 2: Push DVC remote to S3, update docker-compose to use real S3 + managed Redis. Week 3: Deploy feature_server as a K8s CronJob (see separate K8s item). By week 4, all inference services query only the online store: no more 500ms feature joins. Document the TTL sync interval prominently; make it a config parameter, not hardcoded. New team members will otherwise set TTL=24h and skip sync, causing silent staleness.
Experienced dev note
Use Redis ZSET (sorted sets) with timestamp scores, not plain strings. SET user:123:features value at time T, then ZRANGEBYSCORE user:*:features T-TTL T+infinity to auto-expire stale features without application-level staleness checks. This catches sync failures silently: your inference queries only 'fresh' features. Also: deploy feature_server as a Kubernetes CronJob that syncs from S3 to Redis every 5 minutes, not a long-running container. It's cheaper, survives node failures, and simplifies monitoring (a failed job is visible; a hanging service isn't).
Check your understanding
Your offline feature computation takes 4 hours. Your inference SLA is 100ms. Your Redis online store has a TTL of 2 hours. You sync from offline to online every 6 hours. What happens to the 2-hour window between the last sync (at hour 0) and the next sync (at hour 6), and how do you fix it?
Show answer hint
Features expire from Redis at hour 2, but new features don't arrive until hour 6: a 4-hour window with no features. Solution: (1) either compute offline more frequently (every 1-2 hours), (2) sync more frequently than TTL/2 (every hour), or (3) use event-driven sync where offline job triggers sync immediately upon completion. Cron-based sync with long computation time creates blind spots.