Per-Column Storage & Compression

HeliosDB Nano supports per-column compression configuration, allowing you to optimize storage efficiency for different data types.

Overview

Per-column storage enables:

Type-Specific Compression: Apply optimal algorithms per column type
Storage Efficiency: Reduce disk usage by 50-90%
Query Performance: Compressed data requires less I/O
Flexible Configuration: Column-level or table-level settings

Storage Modes

HeliosDB Nano supports four per-column storage modes that can be specified inline at table creation:

Mode	Best For	Benefit
`DEFAULT`	General data	RocksDB LZ4 compression (3-7x)
`DICTIONARY`	Low-cardinality strings	50-95% space reduction
`CONTENT_ADDRESSED`	Large duplicate values	~100% deduplication
`COLUMNAR`	Numeric sequences	20-50% compression + faster aggregates

Inline STORAGE Syntax

Specify storage mode directly in CREATE TABLE:

CREATE TABLE table_name (
    column_name TYPE STORAGE MODE,
    ...
);

Dictionary Encoding Example

Best for columns with limited unique values (status codes, categories, countries):

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    status TEXT STORAGE DICTIONARY,
    shipping_country TEXT STORAGE DICTIONARY,
    payment_method TEXT STORAGE DICTIONARY,
    order_total FLOAT8,
    created_at INT8
);

Content-Addressed Storage Example

Best for large values (>1KB) that repeat across rows (templates, JSON payloads):

CREATE TABLE email_templates (
    template_id INT PRIMARY KEY,
    template_name TEXT,
    html_content TEXT STORAGE CONTENT_ADDRESSED,
    category TEXT STORAGE DICTIONARY
);

Columnar Storage Example

Best for numeric columns used in aggregations (timestamps, metrics, measurements):

CREATE TABLE sensor_readings (
    reading_id INT PRIMARY KEY,
    sensor_id INT,
    sensor_type TEXT STORAGE DICTIONARY,
    location TEXT STORAGE DICTIONARY,
    timestamp INT8 STORAGE COLUMNAR,
    temperature FLOAT8 STORAGE COLUMNAR,
    humidity FLOAT8 STORAGE COLUMNAR,
    pressure FLOAT8 STORAGE COLUMNAR
);

Mixed Modes Example

Combine storage modes for optimal performance:

CREATE TABLE events (
    event_id INT PRIMARY KEY,
    -- Dictionary for low-cardinality strings
    event_type TEXT STORAGE DICTIONARY,
    source_app TEXT STORAGE DICTIONARY,
    user_country TEXT STORAGE DICTIONARY,
    device_type TEXT STORAGE DICTIONARY,
    -- Columnar for numeric aggregations
    timestamp INT8 STORAGE COLUMNAR,
    response_time_ms INT STORAGE COLUMNAR,
    payload_size INT STORAGE COLUMNAR,
    -- Content-addressed for large duplicated values
    error_stack TEXT STORAGE CONTENT_ADDRESSED,
    -- Default for unique/mixed data
    user_id INT,
    session_id TEXT,
    request_path TEXT
);

ALTER TABLE Migration

Change storage mode on existing columns:

-- Migrate to dictionary encoding
ALTER TABLE products ALTER COLUMN category SET STORAGE DICTIONARY;

-- Migrate to columnar storage
ALTER TABLE products ALTER COLUMN price SET STORAGE COLUMNAR;

-- Revert to default
ALTER TABLE products ALTER COLUMN category SET STORAGE DEFAULT;

Compression Codecs

FSST (Fast Static Symbol Table)

Optimized for string/text columns:

Feature	Description
Type	Dictionary-based compression
Best For	VARCHAR, TEXT, CHAR columns
Ratio	2-10x compression
Speed	~1GB/s encode, ~2GB/s decode

ALP (Adaptive Lossless Floating-Point)

Optimized for numeric columns:

Feature	Description
Type	Floating-point compression
Best For	FLOAT4, FLOAT8, NUMERIC
Ratio	2-4x compression
Speed	~800MB/s encode, ~1.5GB/s decode

LZ4 (Default)

General-purpose fast compression:

Feature	Description
Type	Block compression
Best For	Mixed data, general use
Ratio	2-3x compression
Speed	~400MB/s encode, ~2GB/s decode

ZSTD

High-ratio compression:

Feature	Description
Type	Dictionary compression
Best For	Cold data, archives
Ratio	3-5x compression
Speed	~200MB/s encode, ~500MB/s decode

Configuration

Table-Level Compression

-- Set default compression for new tables
SET default_compression = 'fsst';

CREATE TABLE logs (
    id INT,
    message TEXT,
    level VARCHAR(10)
);

Per-Column Configuration

-- Via table options (proposed syntax)
CREATE TABLE sensors (
    id INT,
    timestamp TIMESTAMP,
    temperature FLOAT8,     -- Will use ALP compression
    description TEXT        -- Will use FSST compression
) WITH (
    compression = 'auto'    -- Auto-select per column type
);

Session Settings

-- Configure compression globally
SET compression_level = 'high';      -- low, medium, high
SET fsst_enabled = true;             -- Enable FSST for strings
SET alp_enabled = true;              -- Enable ALP for floats

Codec Selection by Type

Data Type	Recommended Codec	Compression Ratio
TEXT, VARCHAR	FSST	3-10x
FLOAT4, FLOAT8	ALP	2-4x
INT, BIGINT	Delta + LZ4	2-3x
TIMESTAMP	Delta + LZ4	3-5x
BYTEA	LZ4/ZSTD	2-3x
JSONB	FSST	3-8x
VECTOR	Product Quantization	4-32x

Auto-Detection

HeliosDB Nano automatically selects compression based on:

Column data type: Maps type to optimal codec
Data patterns: Analyzes sample data
Configuration: Respects user overrides

-- Enable automatic codec selection
SET compression_auto_detect = true;

Compression Statistics

View Compression Ratios

-- Check compression stats via system views
SELECT
    table_name,
    column_name,
    codec,
    raw_bytes,
    compressed_bytes,
    ROUND(raw_bytes::float / compressed_bytes, 2) as ratio
FROM pg_compression_stats();

Storage Analysis

-- Analyze table storage
SELECT * FROM pg_table_stats() WHERE table_name = 'logs';

Performance Tuning

FSST Tuning

-- Symbol table size (affects compression ratio)
SET fsst_symbol_table_size = 256;   -- Default: 256

-- Training sample size
SET fsst_training_samples = 1000;   -- Default: 1000

ALP Tuning

-- Exception threshold (tradeoff ratio vs speed)
SET alp_exception_threshold = 0.05; -- Default: 5%

-- Adaptive mode
SET alp_adaptive_mode = true;       -- Default: true

Block Configuration

-- Compression applies at block level
SET storage_block_size = 16384;     -- Default: 16KB

-- Larger blocks = better compression, worse random access
SET storage_block_size = 65536;     -- 64KB for analytics

Use Cases

Time-Series Data

CREATE TABLE metrics (
    timestamp TIMESTAMP,    -- Delta compression
    sensor_id INT,          -- Dictionary encoding
    value FLOAT8,           -- ALP compression
    tags TEXT               -- FSST compression
);

-- Achieves 5-10x overall compression

Log Storage

CREATE TABLE application_logs (
    id BIGINT,
    timestamp TIMESTAMP,
    level VARCHAR(10),      -- FSST: high repetition
    message TEXT,           -- FSST: string patterns
    metadata JSONB          -- FSST: JSON strings
);

-- Achieves 4-8x overall compression

Vector Embeddings

CREATE TABLE documents (
    id INT,
    content TEXT,           -- FSST compression
    embedding VECTOR(768)   -- Product Quantization
);

CREATE INDEX ON documents USING hnsw (embedding)
WITH (quantization = 'product');

-- PQ reduces vector storage by 32x

Compression Pipeline

Write Path:
  Raw Data → Type Codec (FSST/ALP) → Block Codec (LZ4) → Storage

Read Path:
  Storage → Block Decode → Type Decode → Raw Data

Benchmarks

FSST Performance (TEXT columns)

Dataset	Raw Size	Compressed	Ratio	Encode Speed
Logs	1 GB	120 MB	8.3x	950 MB/s
URLs	500 MB	80 MB	6.3x	1.1 GB/s
JSON	2 GB	300 MB	6.7x	850 MB/s

ALP Performance (FLOAT columns)

Dataset	Raw Size	Compressed	Ratio	Encode Speed
Sensors	800 MB	200 MB	4.0x	780 MB/s
Financial	1 GB	350 MB	2.9x	820 MB/s
Scientific	2 GB	550 MB	3.6x	750 MB/s

Integration

With Branching

Compression is preserved across branches:

CREATE BRANCH dev FROM main AS OF NOW;
-- Dev branch inherits compression settings

With Time-Travel

Historical queries decompress on-the-fly:

SELECT * FROM logs
AS OF TIMESTAMP '2025-01-01 00:00:00';
-- Transparent decompression

With SMFI

Compression works with storage-level filtering:

-- Zone maps and bloom filters work on compressed blocks
SELECT * FROM logs WHERE timestamp > '2025-01-01';

Per-Column Storage & Compression

Per-Column Storage & Compression

Overview

Storage Modes

Inline STORAGE Syntax

Dictionary Encoding Example

Content-Addressed Storage Example

Columnar Storage Example

Mixed Modes Example

ALTER TABLE Migration

Compression Codecs

FSST (Fast Static Symbol Table)

ALP (Adaptive Lossless Floating-Point)

LZ4 (Default)

ZSTD

Configuration

Table-Level Compression

Per-Column Configuration

Session Settings

Codec Selection by Type

Auto-Detection

Compression Statistics

View Compression Ratios

Storage Analysis

Performance Tuning

FSST Tuning

ALP Tuning

Block Configuration

Use Cases

Time-Series Data

Log Storage

Vector Embeddings

Compression Pipeline

Benchmarks

FSST Performance (TEXT columns)

ALP Performance (FLOAT columns)

Integration

With Branching

With Time-Travel

With SMFI

See Also