Per-Column Storage & Compression
Per-Column Storage & Compression
HeliosDB Nano supports per-column compression configuration, allowing you to optimize storage efficiency for different data types.
Overview
Per-column storage enables:
- Type-Specific Compression: Apply optimal algorithms per column type
- Storage Efficiency: Reduce disk usage by 50-90%
- Query Performance: Compressed data requires less I/O
- Flexible Configuration: Column-level or table-level settings
Storage Modes
HeliosDB Nano supports four per-column storage modes that can be specified inline at table creation:
| Mode | Best For | Benefit |
|---|---|---|
DEFAULT | General data | RocksDB LZ4 compression (3-7x) |
DICTIONARY | Low-cardinality strings | 50-95% space reduction |
CONTENT_ADDRESSED | Large duplicate values | ~100% deduplication |
COLUMNAR | Numeric sequences | 20-50% compression + faster aggregates |
Inline STORAGE Syntax
Specify storage mode directly in CREATE TABLE:
CREATE TABLE table_name ( column_name TYPE STORAGE MODE, ...);Dictionary Encoding Example
Best for columns with limited unique values (status codes, categories, countries):
CREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT, status TEXT STORAGE DICTIONARY, shipping_country TEXT STORAGE DICTIONARY, payment_method TEXT STORAGE DICTIONARY, order_total FLOAT8, created_at INT8);Content-Addressed Storage Example
Best for large values (>1KB) that repeat across rows (templates, JSON payloads):
CREATE TABLE email_templates ( template_id INT PRIMARY KEY, template_name TEXT, html_content TEXT STORAGE CONTENT_ADDRESSED, category TEXT STORAGE DICTIONARY);Columnar Storage Example
Best for numeric columns used in aggregations (timestamps, metrics, measurements):
CREATE TABLE sensor_readings ( reading_id INT PRIMARY KEY, sensor_id INT, sensor_type TEXT STORAGE DICTIONARY, location TEXT STORAGE DICTIONARY, timestamp INT8 STORAGE COLUMNAR, temperature FLOAT8 STORAGE COLUMNAR, humidity FLOAT8 STORAGE COLUMNAR, pressure FLOAT8 STORAGE COLUMNAR);Mixed Modes Example
Combine storage modes for optimal performance:
CREATE TABLE events ( event_id INT PRIMARY KEY, -- Dictionary for low-cardinality strings event_type TEXT STORAGE DICTIONARY, source_app TEXT STORAGE DICTIONARY, user_country TEXT STORAGE DICTIONARY, device_type TEXT STORAGE DICTIONARY, -- Columnar for numeric aggregations timestamp INT8 STORAGE COLUMNAR, response_time_ms INT STORAGE COLUMNAR, payload_size INT STORAGE COLUMNAR, -- Content-addressed for large duplicated values error_stack TEXT STORAGE CONTENT_ADDRESSED, -- Default for unique/mixed data user_id INT, session_id TEXT, request_path TEXT);ALTER TABLE Migration
Change storage mode on existing columns:
-- Migrate to dictionary encodingALTER TABLE products ALTER COLUMN category SET STORAGE DICTIONARY;
-- Migrate to columnar storageALTER TABLE products ALTER COLUMN price SET STORAGE COLUMNAR;
-- Revert to defaultALTER TABLE products ALTER COLUMN category SET STORAGE DEFAULT;Compression Codecs
FSST (Fast Static Symbol Table)
Optimized for string/text columns:
| Feature | Description |
|---|---|
| Type | Dictionary-based compression |
| Best For | VARCHAR, TEXT, CHAR columns |
| Ratio | 2-10x compression |
| Speed | ~1GB/s encode, ~2GB/s decode |
ALP (Adaptive Lossless Floating-Point)
Optimized for numeric columns:
| Feature | Description |
|---|---|
| Type | Floating-point compression |
| Best For | FLOAT4, FLOAT8, NUMERIC |
| Ratio | 2-4x compression |
| Speed | ~800MB/s encode, ~1.5GB/s decode |
LZ4 (Default)
General-purpose fast compression:
| Feature | Description |
|---|---|
| Type | Block compression |
| Best For | Mixed data, general use |
| Ratio | 2-3x compression |
| Speed | ~400MB/s encode, ~2GB/s decode |
ZSTD
High-ratio compression:
| Feature | Description |
|---|---|
| Type | Dictionary compression |
| Best For | Cold data, archives |
| Ratio | 3-5x compression |
| Speed | ~200MB/s encode, ~500MB/s decode |
Configuration
Table-Level Compression
-- Set default compression for new tablesSET default_compression = 'fsst';
CREATE TABLE logs ( id INT, message TEXT, level VARCHAR(10));Per-Column Configuration
-- Via table options (proposed syntax)CREATE TABLE sensors ( id INT, timestamp TIMESTAMP, temperature FLOAT8, -- Will use ALP compression description TEXT -- Will use FSST compression) WITH ( compression = 'auto' -- Auto-select per column type);Session Settings
-- Configure compression globallySET compression_level = 'high'; -- low, medium, highSET fsst_enabled = true; -- Enable FSST for stringsSET alp_enabled = true; -- Enable ALP for floatsCodec Selection by Type
| Data Type | Recommended Codec | Compression Ratio |
|---|---|---|
| TEXT, VARCHAR | FSST | 3-10x |
| FLOAT4, FLOAT8 | ALP | 2-4x |
| INT, BIGINT | Delta + LZ4 | 2-3x |
| TIMESTAMP | Delta + LZ4 | 3-5x |
| BYTEA | LZ4/ZSTD | 2-3x |
| JSONB | FSST | 3-8x |
| VECTOR | Product Quantization | 4-32x |
Auto-Detection
HeliosDB Nano automatically selects compression based on:
- Column data type: Maps type to optimal codec
- Data patterns: Analyzes sample data
- Configuration: Respects user overrides
-- Enable automatic codec selectionSET compression_auto_detect = true;Compression Statistics
View Compression Ratios
-- Check compression stats via system viewsSELECT table_name, column_name, codec, raw_bytes, compressed_bytes, ROUND(raw_bytes::float / compressed_bytes, 2) as ratioFROM pg_compression_stats();Storage Analysis
-- Analyze table storageSELECT * FROM pg_table_stats() WHERE table_name = 'logs';Performance Tuning
FSST Tuning
-- Symbol table size (affects compression ratio)SET fsst_symbol_table_size = 256; -- Default: 256
-- Training sample sizeSET fsst_training_samples = 1000; -- Default: 1000ALP Tuning
-- Exception threshold (tradeoff ratio vs speed)SET alp_exception_threshold = 0.05; -- Default: 5%
-- Adaptive modeSET alp_adaptive_mode = true; -- Default: trueBlock Configuration
-- Compression applies at block levelSET storage_block_size = 16384; -- Default: 16KB
-- Larger blocks = better compression, worse random accessSET storage_block_size = 65536; -- 64KB for analyticsUse Cases
Time-Series Data
CREATE TABLE metrics ( timestamp TIMESTAMP, -- Delta compression sensor_id INT, -- Dictionary encoding value FLOAT8, -- ALP compression tags TEXT -- FSST compression);
-- Achieves 5-10x overall compressionLog Storage
CREATE TABLE application_logs ( id BIGINT, timestamp TIMESTAMP, level VARCHAR(10), -- FSST: high repetition message TEXT, -- FSST: string patterns metadata JSONB -- FSST: JSON strings);
-- Achieves 4-8x overall compressionVector Embeddings
CREATE TABLE documents ( id INT, content TEXT, -- FSST compression embedding VECTOR(768) -- Product Quantization);
CREATE INDEX ON documents USING hnsw (embedding)WITH (quantization = 'product');
-- PQ reduces vector storage by 32xCompression Pipeline
Write Path: Raw Data → Type Codec (FSST/ALP) → Block Codec (LZ4) → Storage
Read Path: Storage → Block Decode → Type Decode → Raw DataBenchmarks
FSST Performance (TEXT columns)
| Dataset | Raw Size | Compressed | Ratio | Encode Speed |
|---|---|---|---|---|
| Logs | 1 GB | 120 MB | 8.3x | 950 MB/s |
| URLs | 500 MB | 80 MB | 6.3x | 1.1 GB/s |
| JSON | 2 GB | 300 MB | 6.7x | 850 MB/s |
ALP Performance (FLOAT columns)
| Dataset | Raw Size | Compressed | Ratio | Encode Speed |
|---|---|---|---|---|
| Sensors | 800 MB | 200 MB | 4.0x | 780 MB/s |
| Financial | 1 GB | 350 MB | 2.9x | 820 MB/s |
| Scientific | 2 GB | 550 MB | 3.6x | 750 MB/s |
Integration
With Branching
Compression is preserved across branches:
CREATE BRANCH dev FROM main AS OF NOW;-- Dev branch inherits compression settingsWith Time-Travel
Historical queries decompress on-the-fly:
SELECT * FROM logsAS OF TIMESTAMP '2025-01-01 00:00:00';-- Transparent decompressionWith SMFI
Compression works with storage-level filtering:
-- Zone maps and bloom filters work on compressed blocksSELECT * FROM logs WHERE timestamp > '2025-01-01';See Also
- Storage Modes Tutorial - Hands-on examples
- Storage Filtering (SMFI)
- Configuration
- Vector Search