Use Case Overview
Real-time feature pipelines with Proton enable:- ML Feature Computation: Calculate features in real-time as events arrive
- Low-latency Serving: Serve features to ML models with sub-second latency
- Windowed Aggregations: Compute time-based features (last 5 minutes, last hour, etc.)
- Historical Backfill: Support both real-time and batch feature computation
- Feature Store: Maintain current and historical feature values
Architecture
A typical real-time feature pipeline consists of:Tutorial: Real-time Fraud Detection
This example demonstrates building a real-time fraud detection feature pipeline using Proton.Prerequisites
Start the fraud detection demo:Step 1: Create Transaction Stream
Capture incoming payment transactions:Step 2: Compute Real-time Features
Create materialized views to compute features as transactions arrive: Feature 1: Transaction Velocity (count in last 5 minutes)Step 3: Create Feature Store Table
Maintain current feature values for each user:Step 4: Serve Features to ML Models
Query features for a specific user in real-time:Step 5: Create Fraud Detection Logic
Combine features to detect potential fraud:Advanced Feature Patterns
Session-based Features
Compute features within user sessions:Ratio and Percentage Features
Lag Features (change from previous window)
Historical Backfill Support
Proton supports both real-time and historical feature computation:Batch Feature Computation
Point-in-time Correctness
Ensure features reflect what was known at prediction time:Integration with ML Frameworks
Python + scikit-learn
Real-time Inference
Performance Optimization
Use Appropriate Window Sizes
Materialize Common Joins
Use versioned_kv for Latest Values
Monitoring Feature Quality
Track Feature Freshness
Feature Distribution Monitoring
Next Steps
- Explore Real-time ETL for data integration
- Learn about Telemetry Pipeline for observability
- Implement Change Data Capture for feature backfill