# Posting index and covering index

The posting index is a compact, high-performance index for symbol columns that supports covering queries. Learn how it works, when to use it, and how to optimize queries with INCLUDE columns.

The **posting index** is an advanced index type for
[symbol](/docs/concepts/symbol/) columns that provides better compression,
faster reads, and **covering index** support compared to the default bitmap
index.

A **covering index** stores additional column values alongside the index
entries, so queries that only need those columns can be answered entirely from
the index without reading the main column files.

## When to use the posting index

Use the posting index when:

- You frequently filter on a symbol column (`WHERE symbol = 'X'`)
- Your queries select a small set of columns alongside the symbol filter
- You want to reduce I/O by reading from compact sidecar files instead of
  full column files
- You need efficient `DISTINCT` queries on a symbol column
- You need efficient `LATEST ON` queries partitioned by a symbol column

The posting index is especially effective for high-cardinality symbol columns
(hundreds to thousands of distinct values) and wide tables where reading full
column files is expensive.

## Creating a posting index

### At table creation

Inline syntax (index defined alongside the column):

```questdb-sql
CREATE TABLE trades (
    timestamp TIMESTAMP,
    symbol SYMBOL INDEX TYPE POSTING,
    exchange SYMBOL,
    price DOUBLE,
    quantity DOUBLE
) TIMESTAMP(timestamp) PARTITION BY DAY WAL;
```

Out-of-line syntax (index defined separately):

```questdb-sql
CREATE TABLE trades (
    timestamp TIMESTAMP,
    symbol SYMBOL,
    exchange SYMBOL,
    price DOUBLE,
    quantity DOUBLE
), INDEX(symbol TYPE POSTING)
TIMESTAMP(timestamp) PARTITION BY DAY WAL;
```

### With covering columns (INCLUDE)

```questdb-sql
CREATE TABLE trades (
    timestamp TIMESTAMP,
    symbol SYMBOL INDEX TYPE POSTING INCLUDE (exchange, price),
    exchange SYMBOL,
    price DOUBLE,
    quantity DOUBLE
) TIMESTAMP(timestamp) PARTITION BY DAY WAL;
```

The `INCLUDE` clause specifies which columns are stored in the index sidecar
files. Queries that only read these columns plus the indexed symbol column
can be served entirely from the index.

:::tip

When you supply an `INCLUDE` clause, the designated timestamp is
automatically appended to it if you did not already list it — you do
not need to type it explicitly. `INCLUDE (exchange, price)` renders
back as `INCLUDE (exchange, price, timestamp)` in `SHOW CREATE TABLE`
after creation. Controlled by the
`cairo.posting.index.auto.include.timestamp` server property
(default `true`).

This auto-append only applies when an `INCLUDE` clause is given **and**
the table has a designated timestamp. A bare `INDEX TYPE POSTING`
(no `INCLUDE`) has no covering layer at all — `SELECT timestamp, sym
FROM t WHERE sym = 'X'` reads the timestamp from the column file in
that case. Tables without a designated timestamp (typically `BYPASS WAL`
tables that omit the `TIMESTAMP(...)` clause) still work normally with
posting indexes and `INCLUDE`; the auto-append simply has nothing to
add.

:::

:::note

The `INCLUDE` clause is only supported with inline column syntax in
`CREATE TABLE` and with `ALTER TABLE`. The out-of-line
`INDEX(col TYPE POSTING)` syntax and `ALTER MATERIALIZED VIEW` both reject
`INCLUDE`.

Writing `INDEX INCLUDE (...)` (no explicit `TYPE`) is also accepted and
implicitly creates a posting index — `INCLUDE` is only valid with
`POSTING`, so the parser promotes the type for you.

:::

### On an existing table

```questdb-sql
ALTER TABLE trades
  ALTER COLUMN symbol ADD INDEX TYPE POSTING INCLUDE (exchange, price);
```

### Encoding options

The posting index supports three row ID encoding options with different
compression and query performance characteristics:

| Syntax | Encoding | Notes |
|--------|----------|-------|
| `INDEX TYPE POSTING` | Adaptive (default) | Trials delta + Frame-of-Reference and Elias-Fano per key per stride and keeps the smaller output |
| `INDEX TYPE POSTING EF` | Elias-Fano only | Forces Elias-Fano even when delta + FoR would be smaller — useful for benchmarking |
| `INDEX TYPE POSTING DELTA` | Delta + Frame-of-Reference only | Forces delta + FoR even when Elias-Fano would be smaller — useful for benchmarking |

**Delta + Frame-of-Reference encoding** stores each key's row IDs as
per-key deltas, split into blocks of 64 with per-block Frame-of-Reference
bitpacking. Round-robin or periodic distributions produce constant
deltas (bitwidth 0), so this mode compresses them to near-zero. The
trade-off is a per-key block-header overhead that hurts low-cardinality
keys.

**Elias-Fano (EF) encoding** is a classic monotonic-sequence encoding:
each key's sorted row IDs are split into low and high bit halves, with
the high half stored as a unary-coded bit array and the low half as a
fixed-width packed array. This typically produces denser output for
keys with few values per stride and avoids the block-header overhead.

The **adaptive (default)** encoding trial-encodes each key with both
delta + Frame-of-Reference and Elias-Fano per stride and picks whichever
produces the smaller output. This is the right choice for almost all
workloads — the explicit `DELTA` / `EF` variants exist mainly for
benchmarking.

```questdb-sql
-- Default adaptive encoding (recommended for most workloads)
CREATE TABLE t1 (ts TIMESTAMP, s SYMBOL INDEX TYPE POSTING)
    TIMESTAMP(ts) PARTITION BY DAY WAL;

-- Force Elias-Fano only (benchmarking)
CREATE TABLE t2 (ts TIMESTAMP, s SYMBOL INDEX TYPE POSTING EF)
    TIMESTAMP(ts) PARTITION BY DAY WAL;

-- Force delta + Frame-of-Reference only (benchmarking)
CREATE TABLE t3 (ts TIMESTAMP, s SYMBOL INDEX TYPE POSTING DELTA)
    TIMESTAMP(ts) PARTITION BY DAY WAL;
```

## Covering index

The covering index is the most powerful feature of the posting index. When all
columns in a query's `SELECT` list are either:

- The indexed symbol column itself (from the `WHERE` clause)
- Listed in the `INCLUDE` clause

...the query engine reads data directly from the index sidecar files, bypassing
the main column files entirely. This is significantly faster for selective
queries on wide tables.

:::caution

The async group-by and filter code paths through the covering index are
currently slower than the regular plan in some workloads. A follow-up
release will close this gap, and the optimizer will continue to improve
as more feedback comes in.

If you notice a query slowdown after [`EXPLAIN`](/docs/query/sql/explain/)
shows it has started picking the covering path, opt that query out with
[`/*+ no_covering */` or `/*+ no_index */`](/docs/concepts/deep-dive/sql-optimizer-hints/#index-hints)
while the optimizations land.

:::

### Supported column types in INCLUDE

All column types except the indexed symbol column itself can be included:

| Type | Compression | Notes |
|------|-------------|-------|
| BOOLEAN, BYTE, GEOBYTE, DECIMAL8 | Frame-of-Reference bitpacking | ≤1 byte per value (worst case) |
| SHORT, CHAR, GEOSHORT, DECIMAL16 | Frame-of-Reference bitpacking | ≤2 bytes per value |
| INT, IPv4, GEOINT, DECIMAL32 | Frame-of-Reference bitpacking | ≤4 bytes per value |
| FLOAT | ALP (Adaptive Lossless floating-Point) | Lossless float compression |
| LONG, DATE, GEOLONG, DECIMAL64 | Frame-of-Reference bitpacking | ≤8 bytes per value |
| TIMESTAMP | Linear-prediction + Frame-of-Reference | Designed for monotonic timestamps |
| DOUBLE | ALP (Adaptive Lossless floating-Point) | Lossless float compression |
| SYMBOL | Frame-of-Reference bitpacking | Stored as integer key, resolved at query time |
| UUID, DECIMAL128 | Raw copy | 16 bytes per value |
| LONG256, DECIMAL256 | Raw copy | 32 bytes per value |
| VARCHAR, STRING | FSST compressed (≥4 KB strides) | Typically 2–5× compression on repetitive text |
| BINARY | Length-prefixed raw bytes | Variable-width, no compression |
| Arrays (DOUBLE[], INT[], etc.) | Length-prefixed raw bytes | Variable-width, no compression |

### How to choose INCLUDE columns

Include columns that you frequently select together with the indexed symbol:

```questdb-sql
-- If your typical queries look like this:
SELECT timestamp, price, quantity FROM trades WHERE symbol = 'AAPL';

-- Then include those columns (timestamp is auto-included as designated timestamp):
CREATE TABLE trades (
    timestamp TIMESTAMP,
    symbol SYMBOL INDEX TYPE POSTING INCLUDE (price, quantity),
    exchange SYMBOL,
    price DOUBLE,
    quantity DOUBLE,
    -- other columns not needed in hot queries
    raw_data VARCHAR,
    metadata VARCHAR
) TIMESTAMP(timestamp) PARTITION BY DAY WAL;
```

:::tip

Only include columns that appear in your most frequent queries. Each included
column adds storage overhead and slows down writes slightly. Columns not in
the `INCLUDE` list can still be queried — they just won't benefit from the
covering optimization and will be read from column files.

:::

### Inspecting indexes with SHOW COLUMNS

[`SHOW COLUMNS`](/docs/query/sql/show/#show-columns) reports the index type
and covered column list per column:

- `indexType` is `POSTING`, `POSTING DELTA`, `POSTING EF`, `BITMAP`, or
  empty for non-indexed columns.
- `indexInclude` lists covered column names — including the auto-included
  designated timestamp.

The [`table_columns()`](/docs/query/functions/meta/#table_columns) function
exposes the same fields programmatically.

### Verifying covering index usage

Use `EXPLAIN` to verify that a query uses the covering index:

```questdb-sql
EXPLAIN SELECT timestamp, price FROM trades WHERE symbol = 'AAPL';
```

If the covering index is used, the plan shows `CoveringIndex`:

```
SelectedRecord
    CoveringIndex on: symbol with: timestamp, price
      filter: symbol='AAPL'
```

`IN`-list filters render as `filter: symbol IN ['AAPL','GOOGL','MSFT']`.
`LATEST ON` queries that hit the covering path show an `op: latest`
annotation and have no `SelectedRecord` wrapper:

```
CoveringIndex op: latest on: symbol with: timestamp, price
  filter: symbol='AAPL'
```

`SELECT DISTINCT` does not need to read covered values, so it shows up as
`PostingIndex op: distinct` rather than `CoveringIndex`:

```
PostingIndex op: distinct on: symbol
    Frame forward scan on: trades
```

When you add a filter on a covered column, an `Async Filter` is layered
above the covering index — the predicate values are read from the sidecar,
not the column file:

```
SelectedRecord
    Async Filter workers: N
      filter: 100<price
        CoveringIndex on: symbol with: price
          filter: symbol='AAPL'
```

If you see `DeferredSingleSymbolFilterPageFrame` or `PageFrame` instead, the
query is reading from column files. This happens when the `SELECT` list
includes columns not in the `INCLUDE` list, or when the `WHERE` clause
doesn't filter on the indexed symbol.

## Comparison with bitmap index

For a side-by-side feature comparison and guidance on choosing between the
two index types, see
[Choosing an index type](/docs/concepts/deep-dive/indexes/#choosing-an-index-type)
on the indexes overview page.

In end-to-end benchmarks (geomean across five workloads, sealed indexes), the
posting index is roughly 13× smaller than the bitmap index and 1.3–1.5×
faster on point, range, and full-scan reads. Writes are ~9% slower than the
bitmap index for the index path itself; sidecar writes add overhead
proportional to the number and type of `INCLUDE` columns.

## Query patterns accelerated

### Point queries (WHERE symbol = 'X')

```questdb-sql
-- Reads from sidecar if price is in INCLUDE
SELECT price FROM trades WHERE symbol = 'AAPL';
```

### Point queries with additional filters

If the additional filter columns are also in `INCLUDE`, the covering index
streams matching rows and an `Async Filter` applies the extra predicate on
top — the predicate values are read from the sidecar, not the column file:

```questdb-sql
-- Covering index + filter on covered column
SELECT price FROM trades WHERE symbol = 'AAPL' AND price > 100;
```

### IN-list queries

```questdb-sql
-- Multiple keys, still uses covering index
SELECT price FROM trades WHERE symbol IN ('AAPL', 'GOOGL', 'MSFT');
```

### LATEST ON queries

```questdb-sql
-- Latest row per symbol, reads from sidecar
SELECT timestamp, symbol, price
FROM trades
WHERE symbol = 'AAPL'
LATEST ON timestamp PARTITION BY symbol;
```

### DISTINCT queries

```questdb-sql
-- Enumerates keys from index metadata, O(keys x partitions) instead of full scan
SELECT DISTINCT symbol FROM trades;

-- Also works with timestamp filters
SELECT DISTINCT symbol FROM trades WHERE timestamp > '2024-01-01';
```

### COUNT queries

```questdb-sql
-- Plan: Count over CoveringIndex, no column data read
SELECT COUNT(*) FROM trades WHERE symbol = 'AAPL';
```

### Aggregate queries on covered columns

```questdb-sql
-- Aggregates over a covered column read from the sidecar instead of
-- the column file
SELECT count(*), min(price), max(price)
FROM trades
WHERE symbol = 'AAPL';
```

## SQL optimizer hints

Two hints opt a query out of the covering and/or index paths for
benchmarking or troubleshooting:

- `no_covering` — read from column files instead of the covering sidecar
- `no_index` — disable index usage entirely (implies `no_covering`)

See [Index hints](/docs/concepts/deep-dive/sql-optimizer-hints/#index-hints)
for syntax, semantics, and examples.

## Trade-offs

### Storage

The posting index itself is very compact (~1 byte per indexed value, vs.
~15 bytes per value for the bitmap index). The covering sidecar adds
storage proportional to the included columns:

- **DOUBLE, FLOAT**: ALP (Adaptive Lossless floating-Point), backed by
  Frame-of-Reference bitpacking with an exception list for outliers.
- **TIMESTAMP**: linear-prediction header with Frame-of-Reference residual
  bitpacking — designed for monotonic timestamp data.
- **Other fixed-width integer types** (BOOLEAN, BYTE, SHORT, CHAR, INT,
  LONG, DATE, IPv4, GEO\*, DECIMAL8–DECIMAL64, SYMBOL keys):
  Frame-of-Reference bitpacking sized to the column's natural width, so
  the worst case is the column-file byte size and typical case is much
  smaller.
- **UUID, LONG256, DECIMAL128, DECIMAL256**: stored raw at full width
  with a small count header.
- **VARCHAR, STRING**: FSST-compressed once a stride exceeds 4 KB of raw
  data; typically 2–5× smaller than the column file.
- **BINARY and arrays**: length-prefixed raw bytes (no compression).

### Write performance

Write overhead depends on the number and type of `INCLUDE` columns:

- **Posting index without INCLUDE**: ~9% slower than the bitmap index for
  the index path itself (delta + Frame-of-Reference encoding vs. simple
  append).
- **Posting index with fixed-width INCLUDE**: additional sidecar write cost
  proportional to the number of columns; values are batched and compressed
  at seal time.
- **Posting index with VARCHAR / STRING / BINARY / ARRAY INCLUDE**: pays
  the full variable-width copy cost per row plus an FSST symbol-table
  rebuild per seal for VARCHAR / STRING.

Query performance improvements typically far outweigh the write cost for
read-heavy workloads.

### Memory

The posting index uses native memory for encoding/decoding buffers. Each
FSST-compressed `VARCHAR` or `STRING` column carries a ~2.3 KB symbol
table that is loaded alongside the sidecar at read time and easily fits
in L1 cache; per-reader decompression buffers are also small.

## Architecture

The posting index stores data in three file types per partition:

- **`.pk`** — Key file: double-buffered metadata pages with the per-key
  generation directory; readers see consistent snapshots via a seqlock
  protocol.
- **`.pv`** — Value file: row IDs encoded as either delta +
  Frame-of-Reference bitpacking or Elias-Fano (depending on the index's
  encoding variant), organised into stride-indexed generations.
- **`.pci` + `.pc0`, `.pc1`, …** — Sidecar files: covered column values
  stored alongside the posting list. The single `.pci` header lists the
  covered columns by writer index (`PCI1` magic, plus the `coverCount`
  used by readers to size their sidecar mappings). Each `.pcN` (with
  txn-segment suffix on disk, e.g. `s.pc0.0.0`) holds the encoded data
  for one `INCLUDE` column. The auto-included designated timestamp
  counts as one of the covered columns and gets its own `.pcN` file.

### Generations and sealing

Data is written incrementally as **generations** (one per commit). Each
generation contains a sparse block of key→rowID mappings. Periodically,
generations are **sealed** into a single dense generation with stride-indexed
layout for optimal read performance.

Sealing happens automatically in two cases:

- When a partition is **closed** — retired by the next partition becoming
  active — it is compacted if it carries more than
  `cairo.posting.seal.gen.threshold` (default 16) unsealed generations.
- While a partition is **active**, WAL fast-lag commits append a new
  sparse generation in place rather than re-sealing. The active partition
  can therefore carry many more generations than the threshold; an inline
  seal is forced only when an internal generation cap is reached.

In practice the 16-generation threshold mostly governs partition
retirement.

Sealed data is written stride-by-stride (256 keys per stride). Within the
delta + Frame-of-Reference family, the writer trial-encodes each stride in
two sub-layouts and keeps whichever produces fewer bytes:

- **Delta sub-layout** — per-key delta encoding, then per-block
  Frame-of-Reference bitpacking. Wins when there are roughly ten or more
  values per key, where the delta distribution lets each block use a
  small bitwidth.
- **Flat sub-layout** — stride-wide Frame-of-Reference with a single base
  and bitwidth, plus a prefix-count array for per-key slicing. Wins when
  keys are sparse (roughly eight or fewer values per key) by eliminating
  per-key metadata.

These are internal to delta + Frame-of-Reference and are independent of the
SQL `DELTA` / `EF` encoding variants described above. When the resulting
bitwidth is 8, 16, or 32, decoding uses a native AVX2 fast path; other
bit widths fall back to a Java decoder.

### FSST compression for strings

VARCHAR and STRING columns in the INCLUDE list are compressed using FSST
(Fast Static Symbol Table) compression during sealing once a stride exceeds
4 KB of raw data. FSST replaces frequently occurring 1–8 byte patterns
with single-byte codes, typically achieving 2–5× compression on string data
with repetitive patterns. The 2.3 KB symbol table fits in L1 cache and
gives stateless O(1) per-value decode.

The FSST symbol table is trained per seal and stored inline in the sidecar
file. Decompression is transparent to the query engine.

## Limitations

:::warning

- `INCLUDE` cannot list the indexed symbol column itself.
- `INCLUDE` is not supported with out-of-line `INDEX(col ...)` syntax —
  use inline column syntax or `ALTER TABLE` instead.
- `CAPACITY` is not supported for posting indexes (bitmap only).
- The covering path engages only when the query filters on the indexed
  symbol (single key, `IN`-list, or bind variable). Queries without such
  a filter — including unfiltered `LATEST ON … PARTITION BY sym`,
  unfiltered `SAMPLE BY`, and unfiltered `GROUP BY` — fall back to a
  regular page-frame scan.
- `REINDEX` on WAL tables requires dropping and re-adding the index
  (this applies to all index types, not just posting).
- `ALTER TABLE … ADD INDEX TYPE POSTING INCLUDE (col, …)` can fail on
  very large partitions when at least one `INCLUDE` column is
  variable-width (`STRING`, `VARCHAR`, `BINARY`) and the seal phase
  would exceed `RSS_MEM_LIMIT`. Fixed-width `INCLUDE` columns stream
  through transparently. Remedies: drop the variable-width column
  from the `INCLUDE` list, reduce partition size, or raise
  `RSS_MEM_LIMIT`. See
  [`cairo.posting.index.indexer.spill.bytes.max`](/docs/configuration/cairo-engine/#cairopostingindexindexerspillbytesmax).

:::