Skip to content

Architecture Overview

Architecture Overview

HeliosDB Nano is a high-performance embedded database with PostgreSQL compatibility, built entirely in Rust for memory safety and performance.

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│ Client Layer │
├─────────────────────────────────────────────────────────────────┤
│ PostgreSQL Wire │ REST API │ Embedded Rust API │
│ Protocol │ (HTTP) │ (Direct Linking) │
└────────┬───────────┴───────┬───────┴──────────┬─────────────────┘
│ │ │
┌────────▼───────────────────▼──────────────────▼─────────────────┐
│ Query Layer │
├─────────────────────────────────────────────────────────────────┤
│ SQL Parser → Planner → Optimizer → Executor │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ Parse Tree Logical Plan Physical Plan Results │
└────────────────────────────────────────────────────────────────┬┘
┌─────────────────────────────────────────────────────────────────▼┐
│ Storage Layer │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Branching │ │ Time-Travel │ │ MVCC │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Catalog │ │ WAL │ │ Compression │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└────────────────────────────────────────────────────────────────┬┘
┌─────────────────────────────────────────────────────────────────▼┐
│ RocksDB Engine │
├─────────────────────────────────────────────────────────────────┤
│ LSM-Tree Storage │ SST Files │ Block Cache │ Bloom Filter│
└─────────────────────────────────────────────────────────────────┘

Core Components

Query Engine

ComponentLocationResponsibility
Parsersrc/sql/parser.rsSQL parsing to AST
Plannersrc/sql/planner.rsLogical plan generation
Optimizersrc/sql/optimizer/Cost-based query optimization
Executorsrc/sql/executor/Physical plan execution

Storage Engine

ComponentLocationResponsibility
Enginesrc/storage/engine.rsRocksDB interface
Catalogsrc/storage/catalog.rsSchema management
MVCCsrc/storage/mvcc.rsMulti-version concurrency
WALsrc/storage/wal.rsWrite-ahead logging
Branchingsrc/storage/branch.rsDatabase branching
Time-Travelsrc/storage/time_travel.rsHistorical queries
ComponentLocationResponsibility
Indexsrc/vector/index.rsHNSW/IVF-PQ indexing
Searchsrc/vector/search.rsSimilarity search
Embeddingssrc/vector/embeddings.rsEmbedding generation

Data Flow

Query Execution Flow

1. Client sends SQL query
2. Parser tokenizes and builds AST
3. Planner creates logical plan
4. Optimizer applies transformations:
- Predicate pushdown
- Join reordering
- Index selection
5. Executor runs physical operations:
- Table scans (with SMFI)
- Index lookups
- Joins (nested loop, hash, merge)
- Aggregations
6. Results returned to client

Transaction Flow

1. BEGIN TRANSACTION
2. Acquire snapshot (MVCC)
3. Execute operations:
- Read from snapshot
- Write to transaction buffer
4. COMMIT:
- Write to WAL
- Apply to storage
- Update catalog
5. Release resources

Key Design Decisions

Why RocksDB?

  • LSM-tree architecture: Optimized for write-heavy workloads
  • Compression: Native support for multiple codecs
  • Column families: Efficient separation of data types
  • Proven reliability: Used in production at scale

Why PostgreSQL Wire Protocol?

  • Ecosystem compatibility: Works with existing tools (psql, pgAdmin)
  • Driver support: Use existing PostgreSQL drivers
  • No migration cost: Drop-in replacement for simple use cases

Branching Implementation

Branches are implemented using RocksDB column families with copy-on-write semantics:

main branch: [base data]
dev branch: [base data] + [delta: new/modified rows]

SMFI (Storage-Level Metadata Filtering)

Parquet-style metadata filtering at the storage level:

Query: WHERE timestamp > '2024-01-01'
Check block metadata:
- Block A: min=2023-01-01, max=2023-12-31 → SKIP
- Block B: min=2024-01-01, max=2024-06-30 → SCAN
- Block C: min=2024-07-01, max=2024-12-31 → SCAN

Module Dependencies

heliosdb_nano (lib.rs)
├── sql/
│ ├── parser (sqlparser)
│ ├── planner
│ ├── optimizer
│ └── executor
│ └── storage/
│ ├── engine (rocksdb)
│ ├── catalog
│ ├── mvcc
│ └── compression/
│ ├── fsst
│ └── alp
├── vector/
│ ├── index (hnsw, ivf)
│ └── embeddings
├── server/
│ ├── postgres (wire protocol)
│ └── http (REST API)
└── repl/ (CLI interface)

Performance Characteristics

OperationComplexityNotes
Point lookupO(log n)B-tree index lookup
Range scanO(log n + k)k = result size
Full scanO(n)With SMFI optimization
Vector searchO(log n)HNSW approximate
Branch creationO(1)Copy-on-write
Time-travel queryO(log n)MVCC snapshot

Configuration

Key configuration parameters affecting architecture:

ParameterDefaultImpact
storage.block_size4KBI/O granularity
storage.cache_size256MBMemory usage
storage.compressionlz4CPU vs space
mvcc.snapshot_retention1hTime-travel range
vector.index_typehnswSearch performance

See Configuration Reference for complete options.