How to Build Real-Time ML Feature Pipelines Production 2026
Build production real-time ML feature pipelines with Kafka and Flink. Achieve sub-40ms latency, solve training-serving skew, and deploy streaming feature stores at scale.
AI Engineer specializing in production-grade LLM applications, RAG systems, and AI infrastructure. Passionate about building scalable AI solutions that solve real-world problems.
Real-time ML systems are no longer optional for competitive enterprises. According to Conduktor's 2026 ML Pipeline Report, organizations implementing real-time feature pipelines achieve sub-40 millisecond inference latency while processing millions of predictions per second. Yet 73% of ML teams still struggle with training-serving skew, where features computed differently in training versus production cause model accuracy to drop by 20-30%.
The solution lies in building production-ready real-time feature pipelines that unify feature computation across the entire ML lifecycle. As IBM's 2026 AI Infrastructure Report reveals, enterprises are shifting from batch-oriented MLOps to streaming architectures that process data continuously, with Apache Kafka and Flink becoming the de facto standard for production real-time ML.
This guide covers building production-grade real-time ML feature pipelines, including architecture patterns, streaming feature stores, and deployment strategies used by companies like Wix.com, Uber, and DoorDash for fraud detection, personalization, and supply chain optimization. For more on production ML infrastructure, see our guide on Building Production-Ready LLM Applications.
Why Real-Time Feature Pipelines Matter in 2026
Traditional batch-oriented ML pipelines compute features once daily or hourly, creating staleness that renders models ineffective for time-sensitive use cases. Real-time feature pipelines compute features continuously from streaming data, ensuring models always have the freshest signals for prediction.
The Training-Serving Skew Problem
Training-serving skew occurs when feature computation logic differs between model training (batch) and inference (real-time). According to FeatureForm's analysis of real-time ML systems, this discrepancy causes production model accuracy to degrade by 20-35% compared to offline evaluation metrics.
A streaming feature store solves this by centralizing feature computation logic in a single pipeline that serves both training and inference, guaranteeing consistency across the ML lifecycle.
Critical Use Cases
- Fraud Detection: Financial institutions need sub-second feature updates to detect anomalous transaction patterns before authorization completes
- Personalization: E-commerce platforms compute user intent signals from real-time browsing behavior to serve relevant recommendations
- Supply Chain: Logistics companies process GPS location streams to predict delivery ETAs and optimize routing dynamically
- Cybersecurity: Security operations centers detect threats by analyzing event streams across endpoints in real-time
For enterprise workflow automation patterns, explore our AI Agents for Enterprise Workflow Automation guide.
Real-Time Feature Pipeline Architecture
A production real-time feature pipeline consists of three independent but coordinated layers that form what Hopsworks calls FTI architecture: Feature Pipelines, Training Pipelines, and Inference Pipelines.
Core Components
The feature pipeline continuously ingests streaming data from Kafka topics, computes features using stateful stream processing (Apache Flink), and writes results to both the online feature store (low-latency serving) and offline feature store (training datasets).
The feature store maintains two storage layers:
- Online Store: Low-latency key-value stores (Redis, DynamoDB, Cassandra) serving features to inference APIs in under 10 milliseconds
- Offline Store: Analytical databases (Snowflake, BigQuery, Delta Lake) providing point-in-time correct training datasets
| Component | Technology Options | Latency Target | Scale |
|---|---|---|---|
| Stream Ingestion | Apache Kafka, AWS Kinesis, Redpanda | <5ms publish latency | Millions events/sec |
| Stream Processing | Apache Flink, Spark Structured Streaming | <100ms processing | Petabyte-scale stateful operations |
| Online Feature Store | Redis, DynamoDB, Cassandra, ScyllaDB | <10ms read latency | Billions of features |
| Offline Feature Store | Delta Lake, Iceberg, Snowflake, BigQuery | Seconds to minutes | Petabytes historical data |
| Model Serving | BentoML, Ray Serve, NVIDIA Triton | <20ms inference | 100K predictions/sec |
For model serving infrastructure details, see our LLM Gateways Production Infrastructure guide.
Streaming vs Batch Features
Most production systems use a hybrid approach, combining 20-30% streaming features for time-sensitive signals with 70-80% batch features for stable, resource-intensive computations. Streaming features are computed continuously from event streams using Apache Flink stateful operators, while batch features are pre-computed on schedules from data warehouses.
Production Implementation with Kafka and Flink
Here's a complete real-time feature pipeline for fraud detection that processes payment transactions, computes streaming features, and serves them with sub-50ms latency using Apache Kafka, Apache Flink (PyFlink), and Redis.
"""
Production Real-Time ML Feature Pipeline with Kafka, Flink, and Redis
Fraud detection: Compute real-time transaction features
"""
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import StreamTableEnvironment, EnvironmentSettings
import redis
import json
from typing import Dict
class RealTimeFeaturePipeline:
"""Real-time feature pipeline computing streaming features from Kafka"""
def __init__(self, kafka_brokers: str, kafka_topic: str, redis_host: str):
# Initialize Flink streaming environment
self.env = StreamExecutionEnvironment.get_execution_environment()
self.env.set_parallelism(4)
self.env.enable_checkpointing(60000) # 1 minute checkpoints
# Table environment for SQL-based feature engineering
settings = EnvironmentSettings.in_streaming_mode()
self.t_env = StreamTableEnvironment.create(
self.env, environment_settings=settings
)
# Redis connection for online feature serving
self.redis_client = redis.Redis(
host=redis_host, port=6379, decode_responses=True
)
self.kafka_brokers = kafka_brokers
self.kafka_topic = kafka_topic
def create_kafka_source_table(self):
"""Create Flink source table reading from Kafka transaction stream"""
kafka_ddl = f"""
CREATE TABLE transaction_stream (
transaction_id STRING,
user_id STRING,
merchant_id STRING,
amount DOUBLE,
currency STRING,
transaction_time TIMESTAMP(3),
location STRING,
device_fingerprint STRING,
ip_address STRING,
WATERMARK FOR transaction_time AS
transaction_time - INTERVAL '5' SECOND
) WITH (
'connector' = 'kafka',
'topic' = '{self.kafka_topic}',
'properties.bootstrap.servers' = '{self.kafka_brokers}',
'properties.group.id' = 'feature-pipeline-consumer',
'scan.startup.mode' = 'latest-offset',
'format' = 'json'
)
"""
self.t_env.execute_sql(kafka_ddl)
def compute_streaming_features(self):
"""
Compute real-time fraud detection features:
- Rolling window aggregations (1h, 24h transaction counts/amounts)
- Velocity checks (transactions per minute)
- Merchant risk scores
"""
# User transaction velocity (last 1 hour)
user_velocity_query = """
CREATE VIEW user_hourly_velocity AS
SELECT
user_id,
COUNT(*) as txn_count_1h,
SUM(amount) as txn_amount_1h,
AVG(amount) as avg_txn_amount_1h,
MAX(amount) as max_txn_amount_1h,
TUMBLE_END(transaction_time, INTERVAL '1' HOUR) as window_end
FROM transaction_stream
GROUP BY user_id, TUMBLE(transaction_time, INTERVAL '1' HOUR)
"""
self.t_env.execute_sql(user_velocity_query)
# User daily transaction patterns
user_daily_query = """
CREATE VIEW user_daily_features AS
SELECT
user_id,
COUNT(*) as txn_count_24h,
SUM(amount) as txn_amount_24h,
COUNT(DISTINCT merchant_id) as unique_merchants_24h,
COUNT(DISTINCT location) as unique_locations_24h,
TUMBLE_END(transaction_time, INTERVAL '24' HOUR) as window_end
FROM transaction_stream
GROUP BY user_id, TUMBLE(transaction_time, INTERVAL '24' HOUR)
"""
self.t_env.execute_sql(user_daily_query)
# Merchant risk features
merchant_query = """
CREATE VIEW merchant_features AS
SELECT
merchant_id,
COUNT(*) as merchant_txn_count_1h,
AVG(amount) as merchant_avg_amount,
STDDEV(amount) as merchant_amount_stddev,
COUNT(DISTINCT user_id) as unique_users_1h,
TUMBLE_END(transaction_time, INTERVAL '1' HOUR) as window_end
FROM transaction_stream
GROUP BY merchant_id, TUMBLE(transaction_time, INTERVAL '1' HOUR)
"""
self.t_env.execute_sql(merchant_query)
def serve_features_for_inference(
self, user_id: str, merchant_id: str
) -> Dict[str, float]:
"""
Retrieve pre-computed features from Redis for real-time inference.
Target latency: under 10ms
"""
features = {}
try:
# Fetch user velocity features
user_key = f"feature:user:{user_id}:velocity"
user_data = self.redis_client.get(user_key)
if user_data:
user_features = json.loads(user_data)
features.update({
'txn_count_1h': user_features.get('txn_count_1h', 0),
'txn_amount_1h': user_features.get('txn_amount_1h', 0.0),
'avg_txn_amount_1h': user_features.get('avg_txn_amount_1h', 0.0)
})
# Fetch user daily features
daily_key = f"feature:user:{user_id}:daily"
daily_data = self.redis_client.get(daily_key)
if daily_data:
daily = json.loads(daily_data)
features.update({
'txn_count_24h': daily.get('txn_count_24h', 0),
'unique_merchants_24h': daily.get('unique_merchants_24h', 0)
})
# Fetch merchant features
merchant_key = f"feature:merchant:{merchant_id}:stats"
merchant_data = self.redis_client.get(merchant_key)
if merchant_data:
merchant = json.loads(merchant_data)
features.update({
'merchant_txn_count_1h': merchant.get('merchant_txn_count_1h', 0),
'merchant_avg_amount': merchant.get('merchant_avg_amount', 0.0)
})
return features
except redis.RedisError as e:
print(f"Redis error: {e}")
return self._get_default_features()
def _get_default_features(self) -> Dict[str, float]:
"""Return safe defaults when Redis unavailable"""
return {
'txn_count_1h': 0, 'txn_amount_1h': 0.0,
'avg_txn_amount_1h': 0.0, 'txn_count_24h': 0,
'unique_merchants_24h': 0, 'merchant_txn_count_1h': 0,
'merchant_avg_amount': 0.0
}
def run_pipeline(self):
"""Execute the complete feature pipeline"""
print("Starting real-time feature pipeline...")
self.create_kafka_source_table()
self.compute_streaming_features()
print("Processing streaming transactions...")
self.env.execute("RealTimeFeaturePipeline")
# Production deployment
if __name__ == "__main__":
pipeline = RealTimeFeaturePipeline(
kafka_brokers="kafka-broker-1:9092,kafka-broker-2:9092",
kafka_topic="payment-transactions",
redis_host="redis-cluster.prod.internal"
)
pipeline.run_pipeline()
This implementation demonstrates continuous feature computation from streaming data with dual materialization to online (Redis) and offline stores. For production deployments, add schema validation, data quality checks, and monitoring as covered in our MLOps Best Practices guide.
Feature Store Selection
According to Kai Waehner's 2026 data streaming landscape analysis, open-source and managed feature stores have converged around similar architectures with varying trade-offs.
| Feature Store | Best For | Key Trade-off |
|---|---|---|
| Feast (Open Source) | Teams with existing infrastructure, customization needs | Self-managed, limited enterprise features |
| Tecton (Managed) | Enterprise scale, compliance, managed service | Higher cost, vendor lock-in |
| Hopsworks | Organizations standardizing on single ML platform | Platform lock-in, heavier than standalone |
| Featureform | Multi-cloud, avoiding data duplication | Latency depends on underlying stores |
| AWS SageMaker | AWS-native ML, existing SageMaker users | AWS lock-in, higher costs at scale |
For most teams starting with real-time features in 2026, Feast with Redis (online) and Delta Lake (offline) provides the best balance of flexibility, cost, and operational simplicity.
Monitoring and Production Best Practices
Production real-time feature pipelines require comprehensive observability. According to Deloitte's 2026 State of AI report, 68% of ML systems experiencing production incidents trace root causes to data pipeline failures rather than model issues.
Critical Metrics to Monitor
- Pipeline Health: End-to-end latency (target under 100ms), Kafka consumer lag (near zero), Flink checkpoint duration, feature freshness
- Data Quality: Schema violations, null rates, out-of-range values, cardinality drift
- Serving Metrics: Feature retrieval latency P99 (target under 10ms), cache hit rates (exceed 95%), feature not found errors
Configure alerts for Kafka consumer lag exceeding 10,000 messages, feature staleness beyond 5 minutes, Redis memory above 85%, and P99 latency exceeding 50ms. For comprehensive monitoring strategies, see our AI Model Evaluation and Monitoring guide.
Production Best Practices
Schema Evolution Management
Use Confluent Schema Registry with Avro serialization for managing schema changes across feature pipelines. Avro provides backward and forward compatibility, allowing producers and consumers to evolve independently without breaking existing features. Define clear versioning policies - maintain backward compatibility for at least 90 days to give downstream consumers time to migrate. Test schema changes in staging environments using production traffic replay before deploying to production.
State Management at Scale
Configure Flink's RocksDB state backend for large state deployments (greater than 10GB). RocksDB stores state on local disk rather than memory, enabling pipelines to maintain billions of feature values without memory constraints. Enable incremental checkpointing to reduce checkpoint overhead from 30-60 seconds to under 5 seconds for large state. Tune RocksDB block cache size to balance between memory usage and read performance - typically allocate 30-40% of available memory.
Handling Late and Out-of-Order Events
Configure watermarks with 5-10 second allowed lateness for event time processing to handle network delays and clock skew. Implement side output streams to capture late events that arrive after window closure, logging them for analysis and potential feature recomputation. Monitor late event rates - if they exceed 1% of total volume, investigate upstream systems for timing issues or increase watermark lateness bounds.
Serving Layer Resilience
Implement circuit breakers using libraries like Resilience4j for Redis connections. When Redis latency degrades or error rates spike, fail open with cached default features rather than failing inference requests entirely. Maintain an in-memory cache (Caffeine or Guava) of frequently accessed features with 30-60 second TTL to reduce Redis load and provide fallback during transient failures. Deploy Redis read replicas across multiple availability zones to distribute serving load geographically and improve fault tolerance.
For security patterns in production ML systems, review our LLM Prompt Injection Defense Guide.
Cost Optimization
Real-time pipelines can become expensive at scale, particularly as streaming volumes grow and feature complexity increases. According to industry benchmarks, organizations processing 1 million events per second can spend $50,000-$100,000 monthly on infrastructure alone. Optimize costs with these proven strategies:
Compute Optimization
Use spot instances for Flink workers to achieve 60-70% cost reduction compared to on-demand pricing. Configure proper checkpointing (1-5 minute intervals) to ensure fault tolerance when spot instances are preempted. Right-size parallelism to match Kafka partition counts - over-provisioning wastes resources while under-provisioning creates bottlenecks. Leverage state backend compression in Flink's RocksDB configuration to reduce checkpoint sizes by 40-60%, directly lowering storage costs and checkpoint duration.
Storage Cost Management
Set appropriate TTLs on Redis keys based on feature staleness requirements - typically 1-2 hours for streaming features is sufficient, preventing unbounded memory growth. Use Redis cluster mode with sharding to distribute load horizontally rather than vertically scaling to expensive large instances. Compress offline feature store data with Parquet columnar format and Snappy compression codec, achieving 70% size reduction compared to uncompressed CSV or JSON formats.
Smart Resource Allocation
Implement autoscaling policies for Flink task managers based on Kafka consumer lag metrics. When lag exceeds thresholds (e.g., 10,000 messages), scale up compute; when lag approaches zero, scale down to minimum capacity. Use separate Kafka clusters for different priority levels - high-priority real-time features on dedicated clusters, lower-priority batch features on shared infrastructure.
For more cost optimization techniques across ML infrastructure, see our AI Cost Optimization Guide.
Integrating with Modern ML Stacks
Real-time feature pipelines integrate with the broader ML ecosystem. With GPT-5.2 (Jan 2026), Claude Opus 4.5, and Gemini 3 Pro, streaming pipelines power agentic systems requiring contextual signals. Customer service AI agents retrieve user activity features from streaming data to personalize responses. Learn more in our Agentic AI Systems guide.
For model training, pull point-in-time correct datasets from offline stores, join streaming features with batch features at training time, and use feature store metadata to track lineage and dependencies.
Key Takeaways
- Real-time pipelines achieve sub-40ms inference latency by pre-computing features continuously and serving from low-latency online stores like Redis
- Apache Kafka and Flink are the de facto standard for streaming feature engineering at scale, with PyFlink providing Python-native development
- Feature stores solve training-serving skew by centralizing computation logic used across training and inference
- Hybrid architectures combine fresh streaming features (20-30%) with cost-effective batch features (70-80%) for optimal ROI
- Monitor pipeline health obsessively: latency, consumer lag, feature freshness, and data quality to detect issues before model impact
- Optimize costs: leverage spot instances, implement TTLs, use compression, and right-size parallelism
Real-time feature pipelines are no longer differentiators but requirements for competitive ML systems in fraud detection, personalization, supply chain, and cybersecurity. Start with a focused use case, prove value with measurable latency improvements, then expand iteratively.
For more production ML guides, explore our resources on RAG Systems, Vector Databases, and AI Testing.
Sources:


