Window Types
Tumbling Windows
Tumbling windows divide the stream into non-overlapping, consecutive time intervals of fixed size. Syntax:stream_or_table: The stream or table nametimestamp_column: (Optional) Column for event time; if omitted, uses processing timewindow_size: INTERVAL expression (e.g.,INTERVAL 5 SECOND,5s)timezone: (Optional) Timezone string (e.g.,'America/New_York')
- Non-overlapping windows
- Each event belongs to exactly one window
- Fixed, predictable boundaries
- Memory efficient - bounded state per window
- Periodic reporting (hourly, daily metrics)
- Rate limiting and throttling
- Simple aggregations over fixed intervals
- Time-series downsampling
src/Processors/Transforms/Streaming/
Hopping Windows
Hopping windows (also called sliding windows) overlap by sliding forward at regular intervals, with a window size that can be larger than the slide interval. Syntax:stream_or_table: The stream or table nametimestamp_column: (Optional) Column for event timeslide_interval: How often to create a new window (hop step)window_size: Size of each windowtimezone: (Optional) Timezone string
- Overlapping windows when
window_size > slide_interval - Each event can belong to multiple windows
- More frequent updates than tumbling windows
- Higher memory usage due to overlapping state
- Moving averages and smoothing
- Detecting trends over sliding time periods
- Real-time dashboards with frequent updates
- Anomaly detection with contextual windows
src/Processors/Transforms/Streaming/HopWindowAssignmentTransform.cpp, with optimization via greatest common divisor (GCD) to share base windows.
Session Windows
Session windows group events based on periods of activity, separated by timeout intervals. A new session begins after a period of inactivity. Syntax:stream: The stream nametimestamp_column: (Optional) Column for event timetimeout_interval: Inactivity timeout that ends a sessionmax_session_size: (Optional) Maximum session durationsession_range_comparison: (Optional) Additional session boundary conditions
- Data-driven window boundaries (not time-driven)
- Variable-length windows based on activity
- A session ends after timeout period of inactivity
- Can span very long or very short durations
- User session analysis
- Click stream analytics
- IoT device activity tracking
- Application usage patterns
- Workflow and process mining
src/Processors/Transforms/Streaming/SessionWindowAssignmentTransform.cpp with special pushdown window assignment.
Window Metadata Functions
window_start
Returns the start timestamp of the current window. Syntax:DateTime or DateTime64 depending on the source timestamp type
window_end
Returns the end timestamp of the current window (exclusive). Syntax:DateTime or DateTime64 depending on the source timestamp type
Note: Window intervals are [window_start, window_end) - inclusive start, exclusive end.
Window Intervals
Window intervals can be specified using:Interval Syntax
- Nanosecond:
INTERVAL 500 NANOSECOND - Microsecond:
INTERVAL 1000 MICROSECOND - Millisecond:
INTERVAL 100 MILLISECOND - Second:
INTERVAL 30 SECOND - Minute:
INTERVAL 5 MINUTE - Hour:
INTERVAL 1 HOUR - Day:
INTERVAL 7 DAY - Week:
INTERVAL 2 WEEK - Month:
INTERVAL 1 MONTH - Quarter:
INTERVAL 1 QUARTER - Year:
INTERVAL 1 YEAR
Shorthand Syntax
Event Time vs Processing Time
Event Time
Use explicit timestamp column for event-time processing:- Handles out-of-order events correctly
- Deterministic results independent of processing speed
- Replay produces same results
- Requires timestamp column in data
- May need watermarks for late data handling
Processing Time
Omit timestamp column to use current system time:- Simple - no timestamp column needed
- Lower latency - no waiting for watermarks
- Useful for monitoring current activity
- Non-deterministic results
- Cannot replay with same results
- Affected by processing delays
Advanced Window Patterns
Cascading Windows
Create hierarchical aggregations:Multi-Window Analysis
Compare different window sizes:Window Joins
Join streams in the same window:Performance Considerations
Memory Management
Tumbling Windows:- Most memory efficient
- State bounded by window size
- Old windows can be immediately freed
- Memory proportional to
window_size / slide_interval - Uses GCD optimization to share base windows
- Example: 1-hour window, 5-minute slide = 12x memory vs tumbling
- Variable memory based on session patterns
- Use
max_session_sizeto bound memory - Can accumulate state for long-running sessions
Computational Efficiency
- Incremental aggregation: Updates computed per event, not per window
- Base window sharing: Hopping windows share computation via GCD
- Parallel processing: Windows can be computed in parallel
Best Practices
- Choose appropriate window size: Balance latency vs completeness
- Use tumbling when possible: More efficient than hopping
- Bound session windows: Set
max_session_sizeto prevent unbounded growth - Consider timezone: Align windows with business hours
- Monitor state size: Use metrics to track window state memory
Watermarks and Late Data
Watermarks track event time progress and determine when to emit window results:- Watermark generation based on event timestamps
- Late data tolerance configurable
- Out-of-order handling
Examples: Common Use Cases
Real-Time Dashboard (Tumbling)
Moving Average (Hopping)
User Session Analysis (Session)
Anomaly Detection (Hopping)
See Also
- Aggregation Functions - Functions to use within windows
- Time Functions - Working with timestamps
- Functions Overview - All available functions