Architecture Overview
Architecture Overview
HeliosDB Nano is a high-performance embedded database with PostgreSQL compatibility, built entirely in Rust for memory safety and performance.
System Architecture
┌─────────────────────────────────────────────────────────────────┐│ Client Layer │├─────────────────────────────────────────────────────────────────┤│ PostgreSQL Wire │ REST API │ Embedded Rust API ││ Protocol │ (HTTP) │ (Direct Linking) │└────────┬───────────┴───────┬───────┴──────────┬─────────────────┘ │ │ │┌────────▼───────────────────▼──────────────────▼─────────────────┐│ Query Layer │├─────────────────────────────────────────────────────────────────┤│ SQL Parser → Planner → Optimizer → Executor ││ │ │ │ │ ││ ▼ ▼ ▼ ▼ ││ Parse Tree Logical Plan Physical Plan Results │└────────────────────────────────────────────────────────────────┬┘ │┌─────────────────────────────────────────────────────────────────▼┐│ Storage Layer │├─────────────────────────────────────────────────────────────────┤│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ Branching │ │ Time-Travel │ │ MVCC │ ││ └─────────────┘ └─────────────┘ └─────────────┘ ││ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ Catalog │ │ WAL │ │ Compression │ ││ └─────────────┘ └─────────────┘ └─────────────┘ │└────────────────────────────────────────────────────────────────┬┘ │┌─────────────────────────────────────────────────────────────────▼┐│ RocksDB Engine │├─────────────────────────────────────────────────────────────────┤│ LSM-Tree Storage │ SST Files │ Block Cache │ Bloom Filter│└─────────────────────────────────────────────────────────────────┘Core Components
Query Engine
| Component | Location | Responsibility |
|---|---|---|
| Parser | src/sql/parser.rs | SQL parsing to AST |
| Planner | src/sql/planner.rs | Logical plan generation |
| Optimizer | src/sql/optimizer/ | Cost-based query optimization |
| Executor | src/sql/executor/ | Physical plan execution |
Storage Engine
| Component | Location | Responsibility |
|---|---|---|
| Engine | src/storage/engine.rs | RocksDB interface |
| Catalog | src/storage/catalog.rs | Schema management |
| MVCC | src/storage/mvcc.rs | Multi-version concurrency |
| WAL | src/storage/wal.rs | Write-ahead logging |
| Branching | src/storage/branch.rs | Database branching |
| Time-Travel | src/storage/time_travel.rs | Historical queries |
Vector Search
| Component | Location | Responsibility |
|---|---|---|
| Index | src/vector/index.rs | HNSW/IVF-PQ indexing |
| Search | src/vector/search.rs | Similarity search |
| Embeddings | src/vector/embeddings.rs | Embedding generation |
Data Flow
Query Execution Flow
1. Client sends SQL query ↓2. Parser tokenizes and builds AST ↓3. Planner creates logical plan ↓4. Optimizer applies transformations: - Predicate pushdown - Join reordering - Index selection ↓5. Executor runs physical operations: - Table scans (with SMFI) - Index lookups - Joins (nested loop, hash, merge) - Aggregations ↓6. Results returned to clientTransaction Flow
1. BEGIN TRANSACTION ↓2. Acquire snapshot (MVCC) ↓3. Execute operations: - Read from snapshot - Write to transaction buffer ↓4. COMMIT: - Write to WAL - Apply to storage - Update catalog ↓5. Release resourcesKey Design Decisions
Why RocksDB?
- LSM-tree architecture: Optimized for write-heavy workloads
- Compression: Native support for multiple codecs
- Column families: Efficient separation of data types
- Proven reliability: Used in production at scale
Why PostgreSQL Wire Protocol?
- Ecosystem compatibility: Works with existing tools (psql, pgAdmin)
- Driver support: Use existing PostgreSQL drivers
- No migration cost: Drop-in replacement for simple use cases
Branching Implementation
Branches are implemented using RocksDB column families with copy-on-write semantics:
main branch: [base data] ↓dev branch: [base data] + [delta: new/modified rows]SMFI (Storage-Level Metadata Filtering)
Parquet-style metadata filtering at the storage level:
Query: WHERE timestamp > '2024-01-01' ↓Check block metadata: - Block A: min=2023-01-01, max=2023-12-31 → SKIP - Block B: min=2024-01-01, max=2024-06-30 → SCAN - Block C: min=2024-07-01, max=2024-12-31 → SCANModule Dependencies
heliosdb_nano (lib.rs) ├── sql/ │ ├── parser (sqlparser) │ ├── planner │ ├── optimizer │ └── executor │ └── storage/ │ ├── engine (rocksdb) │ ├── catalog │ ├── mvcc │ └── compression/ │ ├── fsst │ └── alp ├── vector/ │ ├── index (hnsw, ivf) │ └── embeddings ├── server/ │ ├── postgres (wire protocol) │ └── http (REST API) └── repl/ (CLI interface)Performance Characteristics
| Operation | Complexity | Notes |
|---|---|---|
| Point lookup | O(log n) | B-tree index lookup |
| Range scan | O(log n + k) | k = result size |
| Full scan | O(n) | With SMFI optimization |
| Vector search | O(log n) | HNSW approximate |
| Branch creation | O(1) | Copy-on-write |
| Time-travel query | O(log n) | MVCC snapshot |
Configuration
Key configuration parameters affecting architecture:
| Parameter | Default | Impact |
|---|---|---|
storage.block_size | 4KB | I/O granularity |
storage.cache_size | 256MB | Memory usage |
storage.compression | lz4 | CPU vs space |
mvcc.snapshot_retention | 1h | Time-travel range |
vector.index_type | hnsw | Search performance |
See Configuration Reference for complete options.