Skip to content

High Availability (HA)

High Availability (HA)

UVP

Most embedded databases force you to choose: simple-but-single-node, or pay the operational cost of a full distributed cluster. HeliosDB Nano ships three composable HA tiers as Cargo features, all in the same binary. Tier 1 (warm standby) is on by default — async WAL replication with automatic failover, no extra flag needed. Tier 2 adds branch-based active-active with vector-clock conflict resolution. Tier 3 adds consistent-hash sharding. Combine them or stay on Tier 1; for connection routing and failover at the wire layer, point your clients at the standalone HeliosProxy binary. Pay only for the HA you use.


Tier Matrix

TierCargo featureDefault?ArchitectureConflict resolutionBest for
Tier 1ha-tier1yesWarm standby (async/sync WAL)N/A (single primary)DR, read replicas, blue-green
Tier 2ha-tier2noBranch-based active-activeVector clock + field-level mergeMulti-region writes, edge sync
Tier 3ha-tier3noConsistent hash ring shardingPer-shardHorizontal scaling beyond one host

Optional add-ons

FeatureAddsImplies
ha-dedupContent-addressed deduplication across nodes
ha-ab-testingBranch-based experiment routing
ha-branch-replicationSelective branch sync to remote serversha-tier2
ha-fullBundle of all of the aboveha-tier1+2+3 + add-ons

Build recipes

Terminal window
# Default — Tier 1 only (warm standby + DR)
cargo build --release
# Add Tier 2 (multi-primary)
cargo build --release --features ha-tier2
# Full HA stack
cargo build --release --features ha-full
[dependencies]
heliosdb-nano = { version = "3.19", features = ["ha-tier2"] }

Tier 1: Warm Standby

Active-passive replication with automatic failover.

Architecture

┌─────────────┐ WAL Stream ┌─────────────┐
│ Primary │ ───────────────→ │ Standby │
│ (Active) │ │ (Passive) │
└─────────────┘ └─────────────┘
↓ ↓
Read/Write Read-Only

Components

ComponentDescription
WalReplicatorStreams WAL from primary
WalApplicatorApplies WAL on standby
FailoverWatcherMonitors primary health
LsnManagerTracks replication position
SplitBrainProtectorPrevents dual-primary scenarios

Configuration

use heliosdb_nano::replication::{ReplicationConfig, SyncMode};
let config = ReplicationConfig::builder()
.primary_endpoint("primary.example.com:5432")
.sync_mode(SyncMode::Synchronous) // or Asynchronous
.build();

Sync Modes

ModeDescriptionDurabilityLatency
SynchronousWait for standby ACKStrongHigher
AsynchronousFire-and-forgetEventualLower
QuorumWait for N/2+1 ACKsConfigurableMedium

Failover

use heliosdb_nano::replication::FailoverWatcher;
let watcher = FailoverWatcher::new(config);
watcher.on_failover(|event| {
println!("Failover triggered: {:?}", event);
// Promote standby to primary
});

Split-Brain Protection

use heliosdb_nano::replication::{SplitBrainProtector, ObserverConfig};
let protector = SplitBrainProtector::new(ObserverConfig {
observers: vec!["observer1.example.com", "observer2.example.com"],
quorum_size: 2,
});
protector.start();

Tier 2: Multi-Primary

Active-active replication with conflict resolution.

Architecture

┌─────────────┐ Branch Sync ┌─────────────┐
│ Region A │ ←─────────────→ │ Region B │
│ (Primary) │ │ (Primary) │
└─────────────┘ └─────────────┘
↓ ↓ ↓ ↓
Writes Reads Writes Reads

Components

ComponentDescription
MultiPrimarySyncManagerCoordinates multi-region sync
ConflictMergeEngineResolves write conflicts
RegionCoordinatorManages region topology

Conflict Resolution Strategies

StrategyDescriptionUse Case
Last-Write-WinsTimestamp-basedSimple, no conflicts visible
Branch-WinsPrefer local changesLow-latency local writes
MergeCombine changesCollaborative editing
CustomUser-defined logicComplex business rules

Configuration

use heliosdb_nano::replication::{
MultiPrimarySyncManager,
ConflictResolution,
};
let sync = MultiPrimarySyncManager::new()
.add_region("us-east", "us-east.example.com:5432")
.add_region("eu-west", "eu-west.example.com:5432")
.conflict_resolution(ConflictResolution::LastWriteWins)
.build();

Branch-Based Replication

Multi-primary uses HeliosDB Nano’s branching for conflict-free merges:

-- Each region maintains its own branch
-- Sync merges branches across regions
-- Region A writes
INSERT INTO orders (id, total) VALUES (1, 100);
-- Region B writes (concurrent)
INSERT INTO orders (id, total) VALUES (2, 200);
-- After sync: both rows present in all regions

Tier 3: Sharding

Horizontal scaling with consistent hashing.

Architecture

┌─────────────┐
│ Router │
└──────┬──────┘
┌───────────────┼───────────────┐
↓ ↓ ↓
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ (0-33%) │ │ (34-66%) │ │ (67-100%)│
└──────────┘ └──────────┘ └──────────┘

Components

ComponentDescription
HashRingConsistent hashing for key distribution
ShardRouterRoutes queries to correct shard
ReshardManagerOnline resharding with minimal downtime
VectorPartitionerSpecial partitioning for vector data

Sharding Strategies

StrategyDescriptionBest For
HashConsistent hash of shard keyEven distribution
RangeKey ranges per shardTime-series data
GeographicLocation-based routingMulti-region
VectorCentroid-based partitioningVector search

Configuration

use heliosdb_nano::replication::{HashRing, ShardRouter};
let ring = HashRing::new()
.add_node("shard1.example.com:5432", 100) // weight: 100
.add_node("shard2.example.com:5432", 100)
.add_node("shard3.example.com:5432", 100)
.build();
let router = ShardRouter::new(ring)
.shard_key("tenant_id") // Shard by tenant
.build();

Vector Partitioning

Special support for vector workloads:

use heliosdb_nano::replication::{VectorPartitioner, CentroidManager};
let partitioner = VectorPartitioner::new()
.dimensions(768)
.num_centroids(16) // 16 partitions based on vector similarity
.build();
// Vectors routed to shard containing nearest centroid

Resharding

Online resharding without downtime:

use heliosdb_nano::replication::ReshardManager;
let reshard = ReshardManager::new(ring)
.target_shards(6) // Scale from 3 to 6 shards
.parallel_streams(4)
.build();
reshard.execute().await?; // Non-blocking migration

Logical Replication

For selective table replication:

use heliosdb_nano::replication::{
LogicalReplicationPipeline,
TableFilter,
ColumnMapping,
};
let pipeline = LogicalReplicationPipeline::new()
.source("source.example.com:5432")
.destination("dest.example.com:5432")
.table_filter(TableFilter::include(&["users", "orders"]))
.column_mapping(ColumnMapping::new()
.rename("old_name", "new_name")
.exclude("sensitive_column"))
.build();
pipeline.start().await?;

CLI Options

Start HeliosDB Nano in HA mode:

Terminal window
# Primary mode
heliosdb-nano server --ha-mode primary --ha-bind 0.0.0.0:5433
# Standby mode
heliosdb-nano server --ha-mode standby --ha-primary primary.example.com:5433
# Multi-primary mode
heliosdb-nano server --ha-mode multi-primary \
--ha-region us-east \
--ha-peers eu-west.example.com:5433

Docker Support

Docker Compose for HA cluster:

version: '3.8'
services:
primary:
image: heliosdb/heliosdb-nano:latest
command: server --ha-mode primary
ports:
- "5432:5432"
- "5433:5433"
environment:
- HA_SYNC_MODE=synchronous
standby:
image: heliosdb/heliosdb-nano:latest
command: server --ha-mode standby --ha-primary primary:5433
depends_on:
- primary

Transparent Write Routing (TWR)

HeliosDB Nano implements Transparent Write Routing (TWR) - an innovative feature that allows applications to connect to any node (primary or standby) and have writes automatically routed to the primary.

How It Works

Application → Standby → (DML/DDL forwarded) → Primary
(SELECT executed locally)

Behavior by Sync Mode

Sync ModeDQL (SELECT)DML (INSERT/UPDATE/DELETE)
syncExecute locally on standbyForward to primary, return result
semi-syncExecute locally on standbyForward to primary, return result
asyncExecute locally on standbyReject (traditional read-only)

Operations Subject to Routing

When connected to a standby in sync/semi-sync mode:

OperationBehavior
SELECTExecute locally (DQL)
INSERTForward to primary (DML)
UPDATEForward to primary (DML)
DELETEForward to primary (DML)
CREATEForward to primary (DDL)
DROPForward to primary (DDL)
ALTERForward to primary (DDL)
TRUNCATEForward to primary (DDL)

Example: Transparent Routing

-- Connect to STANDBY and execute INSERT (forwarded to primary)
INSERT INTO users VALUES (3, 'Charlie');
-- Result: INSERT 0 1 (success - executed on primary)
-- SELECT always executes locally on the connected standby
SELECT * FROM users;

Benefits

  1. Load Distribution: Applications can connect to any node; reads distributed, writes auto-routed
  2. Simplified Application Logic: No need for separate read/write connection strings
  3. High Availability: Application continues working if it connects to standby
  4. Transparent Failover: Combined with connection pooling, provides seamless failover

Monitoring

HA System Views

HeliosDB Nano provides SQL system views for monitoring HA configuration and replication metrics.

pg_replication_status

View node configuration and role:

SELECT * FROM pg_replication_status;
ColumnDescription
node_idUnique identifier for this node
roleprimary, standby, observer, or standalone
sync_modeasync, semi-sync, or sync
listen_addressHost and port
replication_portWAL streaming port
current_lsnCurrent log sequence number
is_read_onlytrue/false
standby_countNumber of connected standbys (primary only)
uptime_secondsTime since node started

pg_replication_standbys (Primary Only)

View connected standbys:

SELECT * FROM pg_replication_standbys;
ColumnDescription
node_idStandby’s unique identifier
addressStandby’s connection address
sync_modeReplication mode for this standby
stateconnecting, streaming, catching_up, synced, disconnected
current_lsnStandby’s current LSN position
flush_lsnFlushed LSN
apply_lsnApplied LSN
lag_bytesReplication lag in bytes
lag_msReplication lag in milliseconds
connected_atConnection timestamp
last_heartbeatLast heartbeat received

pg_replication_primary (Standby Only)

View primary connection status:

SELECT * FROM pg_replication_primary;
ColumnDescription
node_idPrimary’s unique identifier
addressPrimary’s address
statedisconnected, connecting, connected, streaming, error
primary_lsnPrimary’s current LSN
local_lsnLocal LSN position
lag_bytesReplication lag in bytes
lag_msReplication lag in milliseconds
fencing_tokenSplit-brain protection token
connected_atConnection timestamp
last_heartbeatLast heartbeat received

pg_replication_metrics

View performance metrics:

SELECT * FROM pg_replication_metrics;
ColumnDescription
wal_writesTotal WAL write operations
wal_bytes_writtenTotal WAL bytes written
records_replicatedRecords sent to standbys
bytes_replicatedBytes sent to standbys
heartbeats_sentHealth check counts sent
heartbeats_receivedHealth check counts received
reconnect_countNumber of reconnections
last_wal_writeTimestamp of last WAL write
last_replicationTimestamp of last replication

Monitoring Examples

-- Check if standbys are in sync
SELECT
node_id,
CASE
WHEN lag_ms < 1000 THEN 'IN_SYNC'
WHEN lag_ms < 60000 THEN 'CATCHING_UP'
ELSE 'LAGGING'
END as status,
lag_ms
FROM pg_replication_standbys;
-- View all nodes in cluster
SELECT node_id, role, current_lsn
FROM pg_replication_status;

Best Practices

  1. Network: Use dedicated replication network
  2. Monitoring: Alert on replication lag > threshold
  3. Testing: Regularly test failover procedures
  4. Backups: Continue point-in-time backups even with HA
  5. Quorum: Use odd number of nodes for consensus

HeliosProxy — Wire-Level Routing & Failover

For PostgreSQL-wire connection routing, read/write splitting, and transparent failover between nodes, deploy the standalone HeliosProxy binary in front of your cluster:

  • Repo: github.com/dimensigon/heliosdb-proxy
  • Topology: sits between clients and the Nano fleet; speaks the PostgreSQL wire protocol on the south side, routes writes to the current primary and reads to standbys.
  • Failover: detects primary loss, promotes a standby, retargets active sessions.
  • Compatible with: every Tier (Tier 1 standby promotion, Tier 2 region pinning, Tier 3 shard fan-out).

Inside the database, Transparent Write Routing (TWR) covers the same need at the protocol layer when you’re connecting directly without a proxy. Use TWR for simple deployments, HeliosProxy for production fleets.


See Also