# SAMPLE BY keyword SAMPLE BY SQL keyword reference documentation. `SAMPLE BY` aggregates time-series data into time-based groups (hourly, daily, etc.) and applies aggregate functions to each group. It is QuestDB's primary tool for downsampling and summarizing large datasets. Supports interpolation for missing intervals via `FILL`. Requires a [designated timestamp](/docs/concepts/designated-timestamp/) column. ## Syntax ```questdb-sql SELECT columns, aggregate_functions FROM table [WHERE conditions] SAMPLE BY [FROM timestamp TO timestamp] [FILL(fill_option)] [ALIGN TO CALENDAR [TIME ZONE tz] [WITH OFFSET offset] | ALIGN TO FIRST OBSERVATION] ``` ### Key clauses - **`SAMPLE BY `**: Groups rows into time buckets (e.g., `1h`, `5m`, `1d`) - **`FROM ... TO`**: Defines explicit time boundaries for filling missing intervals - **`FILL(option)`**: Specifies how to handle missing time buckets (`NONE`, `NULL`, `PREV`, `LINEAR`, or constant) - **`ALIGN TO CALENDAR`**: Aligns buckets to calendar boundaries (default). Supports `TIME ZONE` and `WITH OFFSET` - **`ALIGN TO FIRST OBSERVATION`**: Aligns buckets relative to the first row's timestamp ## Sample units The size of sampled groups are specified with the following syntax: ```questdb-sql SAMPLE BY n{units} ``` Where the unit for sampled groups may be one of the following: | unit | description | | ---- | ----------- | | `n` | nanosecond | | `U` | microsecond | | `T` | millisecond | | `s` | second | | `m` | minute | | `h` | hour | | `d` | day | | `w` | week | | `M` | month | | `y` | year | For example, given a table `trades`, the following query returns the number of trades per hour: ```questdb-sql SELECT ts, count() FROM trades SAMPLE BY 1h; ``` ## Fill options `FILL` specifies how to handle time intervals with no data. By default, missing intervals are skipped. Use `FILL` to interpolate or substitute values. | fillOption | Description | | ----------- | ------------------------------------------------------------------------------------------------------------------------- | | `NONE` | No fill applied. If there is no data, the time sample will be skipped in the results. A table could be missing intervals. | | `NULL` | Fills with `NULL` values. | | `PREV` | Fills using the previous value of the same aggregate column. | | `PREV(col)` | Fills using the previous value of another aggregate column referenced by alias. | | `LINEAR` | Fills by linear interpolation of the 2 surrounding points. | | `x` | Fills with a constant value - where `x` is the desired value, for example `FILL(100.05)`. | The following restrictions apply: - `FILL(NONE)` cannot be combined with other fill values; the other strategies may be mixed in a per-aggregate fill list. - `LINEAR` strategy is not supported for keyed queries, i.e. queries that contain non-aggregated columns other than the timestamp in the SELECT clause. - `PREV(col)` references another aggregate column by alias and must match its type. It cannot be broadcast across multiple aggregates and is rejected when source or target is `SYMBOL`. - The `FILL` keyword must precede alignment described in the [sample calculation section](#sample-calculation), i.e.: ```questdb-sql SELECT ts, max(price) max FROM prices SAMPLE BY 1h FILL(LINEAR) ALIGN TO ... ``` Consider an example table named `prices` which has no records during the entire third hour (`2021-01-01T03`): | ts | price | | --------------------------- | ----- | | 2021-01-01T01:00:00.000000Z | p1 | | 2021-01-01T02:00:00.000000Z | p2 | | 2021-01-01T04:00:00.000000Z | p4 | | 2021-01-01T05:00:00.000000Z | p5 | The following query returns the maximum price per hour. As there are missing values, an aggregate cannot be calculated: ```questdb-sql SELECT ts, max(price) max FROM prices SAMPLE BY 1h; ``` A row is missing for the `2021-01-01T03:00:00.000000Z` sample: | ts | max | | --------------------------- | ---- | | 2021-01-01T01:00:00.000000Z | max1 | | 2021-01-01T02:00:00.000000Z | max2 | | 2021-01-01T04:00:00.000000Z | max4 | | 2021-01-01T05:00:00.000000Z | max5 | A `FILL` strategy can be employed which fills with the previous value using `PREV`: ```questdb-sql SELECT ts, max(price) max FROM prices SAMPLE BY 1h FILL(PREV); ``` | ts | max | | ------------------------------- | -------- | | 2021-01-01T01:00:00.000000Z | max1 | | 2021-01-01T02:00:00.000000Z | max2 | | **2021-01-01T03:00:00.000000Z** | **max2** | | 2021-01-01T04:00:00.000000Z | max4 | | 2021-01-01T05:00:00.000000Z | max5 | Linear interpolation is done using the `LINEAR` fill option: ```questdb-sql SELECT ts, max(price) max FROM prices SAMPLE BY 1h FILL(LINEAR); ``` | ts | max | | ------------------------------- | ----------------- | | 2021-01-01T01:00:00.000000Z | max1 | | 2021-01-01T02:00:00.000000Z | max2 | | **2021-01-01T03:00:00.000000Z** | **(max2+max4)/2** | | 2021-01-01T04:00:00.000000Z | max4 | | 2021-01-01T05:00:00.000000Z | max5 | A constant value can be used as a `fillOption`: ```questdb-sql SELECT ts, max(price) max FROM prices SAMPLE BY 1h FILL(100.5); ``` | ts | max | | ------------------------------- | --------- | | 2021-01-01T01:00:00.000000Z | max1 | | 2021-01-01T02:00:00.000000Z | max2 | | **2021-01-01T03:00:00.000000Z** | **100.5** | | 2021-01-01T04:00:00.000000Z | max4 | | 2021-01-01T05:00:00.000000Z | max5 | Finally, `NULL` may be used as a `fillOption`: ```questdb-sql SELECT ts, max(price) max FROM prices SAMPLE BY 1h FILL(NULL); ``` | ts | max | | ------------------------------- | -------- | | 2021-01-01T01:00:00.000000Z | max1 | | 2021-01-01T02:00:00.000000Z | max2 | | **2021-01-01T03:00:00.000000Z** | **null** | | 2021-01-01T04:00:00.000000Z | max4 | | 2021-01-01T05:00:00.000000Z | max5 | ### Multiple fill values `FILL()` accepts a list of values where each value corresponds to a single aggregate column in the SELECT clause order: ```questdb-sql SELECT min(price), max(price), avg(price), ts FROM prices SAMPLE BY 1h FILL(NULL, 10, PREV); ``` In the above query `min(price)` aggregate will get `FILL(NULL)` strategy applied, `max(price)` will get `FILL(10)`, and `avg(price)` will get `FILL(PREV)`. ## FROM-TO `FILL` only fills intervals between existing data points. To fill outside the bounds of existing data, use `FROM-TO` to specify explicit time boundaries. ```questdb-sql title='Pre-filling trade data' demo SELECT timestamp as ts, count() FROM trades SAMPLE BY 1d FROM '2009-01-01' TO '2009-01-10' FILL(NULL); ``` If no rows exist at the start of the range, QuestDB automatically fills in these rows. This is distinct from the `WHERE` clause with a simple rule of thumb - `WHERE` controls what data flows in, `FROM-TO` controls what data flows out. Use both `FROM` and `TO` in isolation to pre-fill or post-fill data. If `FROM` is not provided, then the lower bound is the start of the dataset, aligned to calendar. The opposite is true omitting `TO`. When used without a `TIME ZONE`, boundary timestamps are interpreted as UTC. When a `TIME ZONE` is specified in `ALIGN TO CALENDAR`, the `FROM` and `TO` values are interpreted as local time in that timezone. For example: ```questdb-sql SELECT ts, count() FROM trades SAMPLE BY 1h FROM '2026-01-01T00:00:00' TO '2026-01-02T00:00:00' FILL(NULL) ALIGN TO CALENDAR TIME ZONE 'Europe/Berlin'; ``` Here, `FROM '2026-01-01T00:00:00'` means midnight January 1st in Berlin local time (= `2025-12-31T23:00:00Z` UTC). `FROM-TO` works with both non-keyed and keyed `SAMPLE BY` queries. Keyed queries emit one row per (bucket, key) combination across the full interval. ### `WHERE` clause optimisation If the user does not provide a `WHERE` clause, or the `WHERE` clause does not consider the designated timestamp, QuestDB will add one for you, matching the `FROM-TO` interval. This means that the query will run optimally, and avoid touching data not relevant to the result. Therefore, we compile the prior query into something similar to this: ```questdb-sql title='Pre-filling trade data with WHERE optimisation' demo SELECT timestamp as ts, count() FROM trades WHERE timestamp >= '2009-01-01' AND timestamp < '2009-01-10' SAMPLE BY 1d FROM '2009-01-01' TO '2009-01-10' FILL(NULL); ``` ### Limitations - Not compatible with `FILL(LINEAR)`. - Only works with `ALIGN TO CALENDAR` (default alignment). - Keyed `FROM-TO` requires `FROM`/`TO` to be constants. Bind variables, function calls, and arithmetic expressions are rejected for keyed queries. See [Fill keyed queries with arbitrary intervals](/docs/cookbook/sql/time-series/fill-keyed-arbitrary-interval/) for a workaround. ## Sample calculation The default time calculation of sampled groups is an absolute value, in other words, sampling by one day is a 24 hour range which is not bound to calendar dates. To align sampled groups to calendar dates, the `ALIGN TO` keywords can be used and are described in the [ALIGN TO CALENDAR](#align-to-calendar) section below. :::note Since QuestDB v7.4.0, the default behaviour for `ALIGN TO` has changed. If you do not specify an explicit alignment, `SAMPLE BY` expressions will use `ALIGN TO CALENDAR` behaviour. The prior default behaviour can be retained by specifying `ALIGN TO FIRST OBSERVATION` on a `SAMPLE BY` query. Alternatively, one can set the `cairo.sql.sampleby.default.alignment.calendar` option to `false` in `server.conf`. ::: ## ALIGN TO FIRST OBSERVATION Consider a table `trades` with the following data spanning three calendar days: ```questdb-sql CREATE TABLE trades ( ts TIMESTAMP, price DOUBLE ) TIMESTAMP(ts) PARTITION BY DAY WAL; INSERT INTO trades (ts, price) VALUES ('2021-05-31T23:10:00.000000Z', 100.5), ('2021-06-01T01:10:00.000000Z', 101.2), ('2021-06-01T07:20:00.000000Z', 100.8), ('2021-06-01T13:20:00.000000Z', 101.0), ('2021-06-01T19:20:00.000000Z', 102.1), ('2021-06-02T01:10:00.000000Z', 101.5), ('2021-06-02T07:20:00.000000Z', 100.9); ``` The following query can be used to sample the table by day. ```questdb-sql SELECT ts, count() FROM trades SAMPLE BY 1d ALIGN TO FIRST OBSERVATION; ``` This query will return two rows: | ts | count | | --------------------------- | ----- | | 2021-05-31T23:10:00.000000Z | 5 | | 2021-06-01T23:10:00.000000Z | 2 | The timestamp value for the 24 hour groups start at the first-observed timestamp, and continue in `1d` intervals. ## ALIGN TO CALENDAR The default behaviour for SAMPLE BY, this option aligns data to calendar dates, with two optional parameters: - [TIME ZONE](#time-zone) - [WITH OFFSET](#with-offset) ```questdb-sql SELECT ts, count() FROM trades SAMPLE BY 1d; ``` or: ```questdb-sql SELECT ts, count() FROM trades SAMPLE BY 1d ALIGN TO CALENDAR; ``` Gives the following result: | ts | count | | --------------------------- | ----- | | 2021-05-31T00:00:00.000000Z | 1 | | 2021-06-01T00:00:00.000000Z | 4 | | 2021-06-02T00:00:00.000000Z | 2 | In this case, the timestamps are floored to the nearest UTC day, and grouped. The counts correspond to the number of entries occurring within each UTC day. This is particularly useful for summarising data for charting purposes; see the [candlestick chart](https://dashboard.questdb.io/d-solo/fb13b4ab-b1c9-4a54-a920-b60c5fb0363f/public-dashboard-questdb-io-use-cases-crypto?orgId=1&refresh=750ms&panelId=6) from the example [crypto dashboard](https://questdb.com/dashboards/crypto/). ### TIME ZONE A time zone may be provided for sampling with calendar alignment. Details on the options for specifying time zones with available formats are provided in the guide for [working with timestamps and time zones](/docs/concepts/timestamps-timezones/). ```questdb-sql SELECT ts, count() FROM trades SAMPLE BY 1d ALIGN TO CALENDAR TIME ZONE 'Europe/Berlin'; ``` In this case, the 24 hour samples begin at `2021-05-31T22:00:00.000000Z`: | ts | count | | --------------------------- | ----- | | 2021-05-31T22:00:00.000000Z | 5 | | 2021-06-01T22:00:00.000000Z | 2 | Additionally, an offset may be applied when aligning sample calculation to calendar ```questdb-sql SELECT ts, count() FROM trades SAMPLE BY 1d ALIGN TO CALENDAR TIME ZONE 'Europe/Berlin' WITH OFFSET '00:45'; ``` In this case, the 24 hour samples begin at `2021-05-31T22:45:00.000000Z`: | ts | count | | --------------------------- | ----- | | 2021-05-31T22:45:00.000000Z | 5 | | 2021-06-01T22:45:00.000000Z | 1 | #### Local timezone output The timestamp values output from `SAMPLE BY` queries is in UTC. To have UTC values converted to specific timezones, the [to_timezone() function](/docs/query/functions/date-time/#to_timezone) should be used. ```questdb-sql SELECT to_timezone(ts, 'PST') ts, count FROM ( SELECT ts, count() FROM trades SAMPLE BY 2h ALIGN TO CALENDAR TIME ZONE 'PST' ); ``` #### Time zone transitions Calendar dates may contain historical time zone transitions or may vary in the total number of hours due to daylight savings time. How `SAMPLE BY` handles DST transitions depends on the stride size. ##### Super-day strides (day, week, month, year) Super-day strides adjust bucket length to follow the local calendar. For example, October 31st 2021 in `Europe/London` is a 25-hour day: > - Sunday, 31 October 2021, 02:00:00 clocks are turned backward 1 hour to > - Sunday, 31 October 2021, 01:00:00 local standard time Given one data point per hour, running `SAMPLE BY 1d` aligned to calendar time zone `Europe/London` will produce a count of `25` for this day. ##### Sub-day strides (hours, minutes, seconds) Sub-day strides use the timezone's standard (non-DST) offset to maintain uniform bucket width in UTC. This has the following effects during DST transitions: - **Fall-back** (clocks go back): the repeated local hour produces two separate UTC-based buckets. Both buckets map to the same local time but have distinct UTC timestamps. Output timestamps remain monotonic in UTC. - **Spring-forward** (clocks go forward): the skipped local hour produces no bucket if no UTC timestamps fall in that range. - Bucket boundaries are evenly spaced in UTC, regardless of DST transitions. Consider a table `trades` with one trade per hour spanning five calendar hours around the `Europe/London` fall-back: | ts | price | | --------------------------- | ----- | | 2021-10-31T00:10:00.000000Z | 100.5 | | 2021-10-31T01:10:00.000000Z | 101.2 | | 2021-10-31T02:10:00.000000Z | 100.8 | | 2021-10-31T03:10:00.000000Z | 101.5 | | 2021-10-31T04:10:00.000000Z | 102.0 | The following query will sample by hour with the `Europe/London` time zone: ```questdb-sql SELECT ts, count() FROM trades SAMPLE BY 1h ALIGN TO CALENDAR TIME ZONE 'Europe/London'; ``` Since `Europe/London` has a standard offset of UTC+0, sub-day buckets align with UTC hour boundaries. Each data point falls into its own bucket: | ts | count | | --------------------------- | ----- | | 2021-10-31T00:00:00.000000Z | 1 | | 2021-10-31T01:00:00.000000Z | 1 | | 2021-10-31T02:00:00.000000Z | 1 | | 2021-10-31T03:00:00.000000Z | 1 | | 2021-10-31T04:00:00.000000Z | 1 | For timezones with a non-zero standard offset (e.g., `Europe/Berlin` at UTC+1), bucket boundaries shift by the standard offset but remain uniformly spaced in UTC. During a fall-back transition, data in the repeated local hour will be split across two consecutive UTC buckets rather than merged into one. ### WITH OFFSET Aligning sampling calculation can be provided an arbitrary offset in the format `'+/-HH:mm'`, for example: - `'00:30'` plus thirty minutes - `'+00:30'` plus thirty minutes - `'-00:15'` minus 15 minutes The query uses the default offset '00:00' if the parameter is not set. ```questdb-sql SELECT ts, count() FROM trades SAMPLE BY 1d ALIGN TO CALENDAR WITH OFFSET '02:00'; ``` In this case, the 24 hour samples begin at `2021-05-31T02:00:00.000000Z`: | ts | count | | --------------------------- | ----- | | 2021-05-31T02:00:00.000000Z | 2 | | 2021-06-01T02:00:00.000000Z | 4 | | 2021-06-02T02:00:00.000000Z | 1 | ### TIME ZONE WITH OFFSET The `TIME ZONE` and `WITH OFFSET` options can be combined. ```questdb-sql SELECT ts, count() FROM trades SAMPLE BY 1h ALIGN TO CALENDAR TIME ZONE 'Europe/London' WITH OFFSET '02:00'; ``` The sample then begins from `Europe/London` at `2021-10-31T02:00:00.000000Z`: | ts | count | | --------------------------- | ----- | | 2021-10-31T02:00:00.000000Z | 1 | | 2021-10-31T03:00:00.000000Z | 1 | | 2021-10-31T04:00:00.000000Z | 3 | | 2021-10-31T05:00:00.000000Z | 2 | ## Examples Assume the following table `trades`: | ts | quantity | price | | --------------------------- | -------- | ------ | | 2021-05-31T23:45:10.000000Z | 10 | 100.05 | | 2021-06-01T00:01:33.000000Z | 5 | 100.05 | | 2021-06-01T00:15:14.000000Z | 200 | 100.15 | | 2021-06-01T00:30:40.000000Z | 300 | 100.15 | | 2021-06-01T00:45:20.000000Z | 10 | 100 | | 2021-06-01T01:00:50.000000Z | 50 | 100.15 | This query will return the number of trades per hour: ```questdb-sql title="Hourly interval" SELECT ts, count() FROM trades SAMPLE BY 1h; ``` | ts | count | | --------------------------- | ----- | | 2021-05-31T23:45:10.000000Z | 3 | | 2021-06-01T00:45:10.000000Z | 1 | | 2021-05-31T23:45:10.000000Z | 1 | | 2021-06-01T00:45:10.000000Z | 1 | The following will return the trade volume in 30 minute intervals ```questdb-sql title="30 minute interval" SELECT ts, sum(quantity*price) FROM trades SAMPLE BY 30m; ``` | ts | sum | | --------------------------- | ------ | | 2021-05-31T23:45:10.000000Z | 1000.5 | | 2021-06-01T00:15:10.000000Z | 16024 | | 2021-06-01T00:45:10.000000Z | 8000 | | 2021-06-01T00:15:10.000000Z | 8012 | | 2021-06-01T00:45:10.000000Z | 8000 | The following will return the average trade notional (where notional is = q \* p) by day: ```questdb-sql title="Daily interval" SELECT ts, avg(quantity*price) FROM trades SAMPLE BY 1d; ``` | ts | avg | | --------------------------- | ----------------- | | 2021-05-31T23:45:10.000000Z | 6839.416666666667 | To make this sample align to calendar dates: ```questdb-sql title="Calendar alignment" SELECT ts, avg(quantity*price) FROM trades SAMPLE BY 1d ALIGN TO CALENDAR; ``` | ts | avg | | --------------------------- | ------ | | 2021-05-31T00:00:00.000000Z | 1000.5 | | 2021-06-01T00:00:00.000000Z | 8007.2 | ## Performance optimization For frequently executed `SAMPLE BY` queries, consider using [materialized views](/docs/concepts/materialized-views/) to pre-compute aggregates. This can significantly improve query performance, especially for complex sampling operations on large datasets. ```questdb-sql CREATE MATERIALIZED VIEW hourly_metrics AS SELECT timestamp_floor('h', ts) as hour, symbol, avg(price) as avg_price, sum(volume) as total_volume FROM trades SAMPLE BY 1h; ``` ## See also This section includes links to additional information such as tutorials: - [PIVOT](/docs/query/sql/pivot/) - Transform SAMPLE BY results from rows to columns - [Materialized Views](/docs/concepts/materialized-views/) - Pre-compute SAMPLE BY queries for better performance - [SQL Extensions for Time-Series Data in QuestDB](/blog/2022/11/23/sql-extensions-time-series-data-questdb-part-ii/) - [Three SQL Keywords for Finding Missing Data](/blog/three-sql-keywords-for-finding-missing-data/)