Multi-User Transactions Architecture

Version: 3.4.0 Status: Production-Ready Last Updated: 2026-01-04

System Architecture Overview
Session Lifecycle
Transaction Flow with Locking
Lock Manager Deadlock Detection
Isolation Level Implementation
Dump/Restore Data Flow
Memory Layout and Data Structures
Performance Optimization Strategies

System Architecture Overview

HeliosDB Nano v3.1.0 implements a layered architecture for multi-user ACID transactions:

┌────────────────────────────────────────────────────────────────┐
│                      Client Applications                        │
│   (psql, JDBC, Python drivers, HTTP API, CLI tools)           │
└─────────────────────────────┬──────────────────────────────────┘
                              │
                              │ PostgreSQL Wire Protocol / HTTP
                              │
┌─────────────────────────────▼──────────────────────────────────┐
│                    Protocol Layer (PgServer)                    │
│  ┌───────────────┐  ┌───────────────┐  ┌──────────────────┐  │
│  │  Connection   │  │   Session     │  │  Authentication  │  │
│  │   Handler     │  │   Manager     │  │  (Trust/Pass/JWT)│  │
│  └───────────────┘  └───────────────┘  └──────────────────┘  │
└─────────────────────────────┬──────────────────────────────────┘
                              │
                              │ Session API
                              │
┌─────────────────────────────▼──────────────────────────────────┐
│                   Session Management Layer                      │
│  ┌────────────────────┐  ┌──────────────┐  ┌────────────────┐ │
│  │  SessionManager    │  │ UserManager  │  │ ResourceQuota  │ │
│  │  - create()        │  │ - auth()     │  │   Manager      │ │
│  │  - begin_txn()     │  │ - authorize()│  │ - check()      │ │
│  │  - commit()        │  │ - audit()    │  │ - enforce()    │ │
│  └────────────────────┘  └──────────────┘  └────────────────┘ │
└─────────────────────────────┬──────────────────────────────────┘
                              │
                              │ Transaction API
                              │
┌─────────────────────────────▼──────────────────────────────────┐
│                 Transaction Management Layer                    │
│  ┌────────────────────┐  ┌────────────────┐  ┌──────────────┐ │
│  │   Transaction      │  │  LockManager   │  │MVCC Snapshot │ │
│  │   - get()          │  │  - acquire()   │  │   Manager    │ │
│  │   - put()          │  │  - release()   │  │ - create()   │ │
│  │   - commit()       │  │  - detect_dl() │  │ - read_at()  │ │
│  └────────────────────┘  └────────────────┘  └──────────────┘ │
└─────────────────────────────┬──────────────────────────────────┘
                              │
                              │ Storage API
                              │
┌─────────────────────────────▼──────────────────────────────────┐
│                      Storage Engine Layer                       │
│  ┌────────────────────┐  ┌────────────────┐  ┌──────────────┐ │
│  │  StorageEngine     │  │  DumpManager   │  │ DirtyTracker │ │
│  │  - insert()        │  │  - dump()      │  │ - is_dirty() │ │
│  │  - scan()          │  │  - restore()   │  │ - mark()     │ │
│  │  - update()        │  │  - history()   │  │ - clear()    │ │
│  └────────────────────┘  └────────────────┘  └──────────────┘ │
│                             │                                   │
│                             ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │        RocksDB (In-Memory Backend)                       │  │
│  │  - MemTable (hash-based, lock-free)                      │  │
│  │  - No SST files (pure RAM)                               │  │
│  │  - Optional persistence to dump files                    │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Component Responsibilities

Component	Responsibility	Thread-Safety
PgServer	Handle PostgreSQL wire protocol, parse queries	Per-connection thread
SessionManager	Create/destroy sessions, transaction lifecycle	`Arc<DashMap>`
LockManager	Acquire/release locks, deadlock detection	`Arc<DashMap>`
Transaction	Execute operations, maintain MVCC snapshot	Isolated per session
StorageEngine	Read/write data, manage persistence	Thread-safe (RocksDB)
DumpManager	Export/import database state	Read-write locks

Session Lifecycle

Session State Machine

┌─────────────┐
│ Disconnected│
└──────┬──────┘
       │ connect()
       ▼
┌─────────────┐     authenticate()      ┌──────────────┐
│ Connecting  │────────────────────────>│ Authenticated│
└─────────────┘                         └──────┬───────┘
       │                                       │
       │ auth_failed()                         │ create_session()
       ▼                                       ▼
┌─────────────┐                         ┌──────────────┐
│   Closed    │<────────────────────────│   Active     │
└─────────────┘     timeout/close       └──────┬───────┘
                                               │
                                               │ begin_transaction()
                                               ▼
                                        ┌──────────────┐
                                        │ In Transaction│
                                        └──────┬───────┘
                                               │
                                               │ commit/rollback
                                               ▼
                                        ┌──────────────┐
                                        │   Active     │
                                        └──────────────┘

Session Creation Flow

Client                 SessionManager           UserManager         ResourceQuota
  │                          │                        │                    │
  │──create_session()────────>│                       │                    │
  │                          │                        │                    │
  │                          │──authenticate()────────>│                   │
  │                          │                        │                    │
  │                          │<──auth_result──────────│                   │
  │                          │                        │                    │
  │                          │──check_quota()────────────────────────────>│
  │                          │                        │                    │
  │                          │<──quota_ok()───────────────────────────────│
  │                          │                        │                    │
  │                          │──allocate_session()────│                    │
  │                          │                        │                    │
  │<──SessionId──────────────│                        │                    │
  │                          │                        │                    │

Session Data Structure

pub struct Session {
    /// Unique session identifier
    pub id: SessionId,

    /// User who owns this session
    pub user_id: UserId,

    /// Transaction isolation level
    pub isolation_level: IsolationLevel,

    /// Currently active transaction (if any)
    pub active_txn: Option<TransactionId>,

    /// Session creation timestamp
    pub created_at: u64,

    /// Last activity timestamp (for timeout detection)
    pub last_activity: u64,

    /// Session statistics
    pub stats: SessionStats,
}

pub struct SessionStats {
    /// Number of transactions committed
    pub transactions_committed: u64,

    /// Number of transactions rolled back
    pub transactions_rolled_back: u64,

    /// Total queries executed
    pub queries_executed: u64,

    /// Total rows read
    pub rows_read: u64,

    /// Total rows written
    pub rows_written: u64,

    /// Memory allocated by this session (bytes)
    pub memory_bytes: u64,
}

Transaction Flow with Locking

Transaction Lifecycle

┌─────────────────────────────────────────────────────────────────┐
│ 1. BEGIN TRANSACTION                                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  SessionManager::begin_transaction(session_id)                  │
│    │                                                            │
│    ├─> Validate session exists and is authenticated            │
│    ├─> Check no active transaction                             │
│    ├─> Create MVCC snapshot based on isolation level:          │
│    │     - READ COMMITTED: latest committed snapshot           │
│    │     - REPEATABLE READ: consistent point-in-time snapshot  │
│    │     - SERIALIZABLE: serializable snapshot with tracking   │
│    ├─> Allocate Transaction context                            │
│    └─> Return TransactionId                                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ 2. EXECUTE QUERIES (SELECT, INSERT, UPDATE, DELETE)            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  For SELECT (read operation):                                   │
│    │                                                            │
│    ├─> LockManager::acquire_lock(txn_id, key, SHARED)          │
│    │     │                                                      │
│    │     ├─> Check lock compatibility with existing locks      │
│    │     ├─> If compatible: grant immediately                  │
│    │     ├─> If not compatible: wait or timeout                │
│    │     └─> Detect deadlock (DFS on wait-for graph)           │
│    │                                                            │
│    ├─> Transaction::get(key)                                   │
│    │     │                                                      │
│    │     ├─> Read from MVCC snapshot                           │
│    │     ├─> Apply visibility rules based on isolation level   │
│    │     └─> Return value (if visible)                         │
│    │                                                            │
│    └─> Return result set to client                             │
│                                                                 │
│  For INSERT/UPDATE/DELETE (write operation):                   │
│    │                                                            │
│    ├─> LockManager::acquire_lock(txn_id, key, EXCLUSIVE)       │
│    │     │                                                      │
│    │     ├─> Check lock compatibility (X locks are exclusive)  │
│    │     ├─> Wait for conflicting locks to release             │
│    │     └─> Grant lock or timeout/deadlock                    │
│    │                                                            │
│    ├─> Transaction::put(key, value) / delete(key)              │
│    │     │                                                      │
│    │     ├─> Buffer write in transaction-local write set       │
│    │     ├─> Do NOT write to storage yet (2PC)                 │
│    │     └─> Mark key as modified                              │
│    │                                                            │
│    └─> Return affected rows count                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ 3. COMMIT TRANSACTION                                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  SessionManager::commit_transaction(session_id)                 │
│    │                                                            │
│    ├─> Get active transaction                                  │
│    ├─> If SERIALIZABLE: validate_serializability()             │
│    │     │                                                      │
│    │     ├─> Check for read-write conflicts                    │
│    │     ├─> Check for write-write conflicts                   │
│    │     └─> Abort if conflict detected                        │
│    │                                                            │
│    ├─> Acquire commit locks (brief exclusive lock)             │
│    ├─> Apply buffered writes to storage atomically             │
│    ├─> Update MVCC version numbers                             │
│    ├─> Release all locks held by transaction                   │
│    ├─> Mark transaction as committed                           │
│    ├─> Update dirty state tracker                              │
│    └─> Free transaction resources                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ 4. ROLLBACK TRANSACTION (on error or explicit ROLLBACK)        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  SessionManager::rollback_transaction(session_id)               │
│    │                                                            │
│    ├─> Get active transaction                                  │
│    ├─> Discard buffered writes (no storage changes)            │
│    ├─> Release all locks held by transaction                   │
│    ├─> Mark transaction as rolled back                         │
│    └─> Free transaction resources                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Lock Acquisition Algorithm

fn acquire_lock(
    txn_id: TransactionId,
    key: Key,
    mode: LockMode
) -> Result<(), Error> {
    let start_time = now();

    loop {
        // 1. Try to acquire lock
        match try_acquire_lock(txn_id, key.clone(), mode) {
            Ok(true) => return Ok(()),  // Lock acquired
            Ok(false) => {
                // Lock not available, check for issues
            }
            Err(e) => return Err(e),
        }

        // 2. Detect deadlock using wait-for graph
        if detect_deadlock(txn_id)? {
            return Err(Error::Deadlock(
                format!("Deadlock detected for txn {}", txn_id)
            ));
        }

        // 3. Check timeout
        if now() - start_time > LOCK_TIMEOUT_MS {
            return Err(Error::LockTimeout);
        }

        // 4. Wait briefly and retry
        sleep_ms(10);
    }
}

fn try_acquire_lock(
    txn_id: TransactionId,
    key: Key,
    mode: LockMode
) -> Result<bool, Error> {
    let mut lock_table = LOCK_TABLE.entry(key).or_default();

    // Check compatibility with all existing locks
    for existing_lock in lock_table.iter() {
        if existing_lock.txn_id == txn_id {
            // Same transaction already holds lock
            if existing_lock.mode.is_compatible(mode) {
                return Ok(true);  // Already have compatible lock
            } else {
                // Need to upgrade lock
                continue;
            }
        }

        if !mode.is_compatible(existing_lock.mode) {
            // Conflict with other transaction
            WAIT_FOR_GRAPH
                .entry(txn_id)
                .or_default()
                .insert(existing_lock.txn_id);
            return Ok(false);  // Cannot acquire yet
        }
    }

    // No conflicts - acquire lock
    lock_table.push(LockEntry {
        txn_id,
        mode,
        acquired_at: now(),
    });

    Ok(true)
}

Lock Manager Deadlock Detection

Wait-For Graph

The lock manager maintains a directed graph where:

Nodes: Active transactions
Edges: Transaction A → Transaction B means A is waiting for B to release a lock

Deadlock: Cycle in the wait-for graph

Example: Three transactions waiting on each other

Txn 101 holds lock on Row A, waits for Row B (held by Txn 102)
Txn 102 holds lock on Row B, waits for Row C (held by Txn 103)
Txn 103 holds lock on Row C, waits for Row A (held by Txn 101)

Wait-For Graph:
    101 ──> 102
     ▲       │
     │       ▼
    103 <──

Cycle detected: 101 -> 102 -> 103 -> 101 (DEADLOCK!)

Deadlock Detection Algorithm (DFS)

fn detect_deadlock(txn_id: TransactionId) -> Result<bool, Error> {
    let mut visited = HashSet::new();
    let mut stack = vec![txn_id];
    let mut rec_stack = HashSet::new();  // Recursion stack

    while let Some(current) = stack.pop() {
        if rec_stack.contains(&current) {
            // Cycle detected - current transaction in recursion stack
            return Ok(true);
        }

        if visited.contains(&current) {
            continue;
        }

        visited.insert(current);
        rec_stack.insert(current);

        // Explore all transactions that current is waiting for
        if let Some(waiting_on) = WAIT_FOR_GRAPH.get(&current) {
            for &next_txn in waiting_on.iter() {
                stack.push(next_txn);
            }
        }
    }

    Ok(false)  // No cycle
}

Deadlock Resolution

When a deadlock is detected:

Choose Victim: Select transaction to abort (youngest transaction)
Abort Transaction: Release all locks, rollback changes
Notify Client: Return error Error::Deadlock
Client Retries: Application retries transaction

Txn 101 ──X──> Txn 102
 ▲              │
 │              ▼
Txn 103 <──────

Deadlock detected!
Abort Txn 101 (youngest):
  - Release lock on Row A
  - Rollback changes
  - Return Error::Deadlock to client

Now Txn 103 can acquire lock on Row A and proceed.

Isolation Level Implementation

READ COMMITTED

Guarantees:

No dirty reads (uncommitted data invisible)
Fresh snapshot per SQL statement
May see non-repeatable reads and phantom rows

Implementation:

impl Transaction {
    pub fn execute_statement(&mut self, stmt: &Statement) -> Result<ResultSet, Error> {
        // Create fresh snapshot for each statement
        self.snapshot = self.snapshot_manager.latest_snapshot();

        // Execute statement using fresh snapshot
        self.execute_with_snapshot(stmt, &self.snapshot)
    }
}

Visibility Rules:

fn is_visible(tuple_version: &TupleVersion, snapshot: &Snapshot) -> bool {
    // Tuple created by committed transaction before snapshot?
    if tuple_version.created_by_txn < snapshot.xmin {
        // Check if not deleted, or deleted after snapshot
        return tuple_version.deleted_by_txn == 0 ||
               tuple_version.deleted_by_txn > snapshot.xmax;
    }

    // Tuple created by this transaction?
    if tuple_version.created_by_txn == snapshot.txn_id {
        return tuple_version.deleted_by_txn == 0 ||
               tuple_version.deleted_by_txn != snapshot.txn_id;
    }

    false  // Not visible
}

REPEATABLE READ

Guarantees:

Consistent snapshot throughout transaction
No dirty reads, no non-repeatable reads
May still see phantom rows

Implementation:

impl Transaction {
    pub fn begin(&mut self, isolation: IsolationLevel) -> Result<(), Error> {
        if isolation == IsolationLevel::RepeatableRead {
            // Create snapshot once at transaction start
            self.snapshot = self.snapshot_manager.create_snapshot();
            self.snapshot.freeze();  // Immutable for entire transaction
        }
        Ok(())
    }
}

SERIALIZABLE

Guarantees:

Full serializability
Transactions appear to execute sequentially
No anomalies (dirty read, non-repeatable read, phantom, serialization)

Implementation:

struct SerializableContext {
    /// Read set: keys read during transaction
    read_set: HashSet<Key>,

    /// Write set: keys written during transaction
    write_set: HashSet<Key>,

    /// Snapshot at transaction start
    snapshot: Snapshot,
}

impl Transaction {
    pub fn commit_serializable(&self) -> Result<(), Error> {
        // Check for read-write conflicts (other transactions wrote to our read set)
        for key in &self.read_set {
            if let Some(latest_version) = self.storage.get_latest_version(key) {
                if latest_version.created_by_txn > self.snapshot.xmax {
                    // Someone wrote to a key we read - conflict!
                    return Err(Error::SerializationFailure(
                        format!("Read-write conflict on key {:?}", key)
                    ));
                }
            }
        }

        // Check for write-write conflicts (handled by exclusive locks)
        // Locks ensure only one transaction can write to a key at a time

        // Commit atomically
        self.apply_writes()?;
        Ok(())
    }
}

Dump/Restore Data Flow

Dump Operation Flow

┌──────────────────────────────────────────────────────────────┐
│ Dump Process (Non-Blocking)                                  │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  DumpManager::dump(options)                                  │
│    │                                                         │
│    ├─> Create dump metadata                                 │
│    │     - dump_id (UUID)                                   │
│    │     - version (3.1.0)                                  │
│    │     - mode (Full/Incremental)                          │
│    │     - current LSN                                      │
│    │                                                         │
│    ├─> Open dump file writer                                │
│    │     - Write magic header "HELIODMP"                    │
│    │     - Write metadata                                   │
│    │                                                         │
│    ├─> For each table:                                      │
│    │   │                                                    │
│    │   ├─> Acquire read-only snapshot (no locks)           │
│    │   ├─> Write table schema                              │
│    │   │     - Table name                                  │
│    │   │     - Column definitions                          │
│    │   │     - Indexes                                     │
│    │   │     - Constraints                                 │
│    │   │                                                    │
│    │   ├─> Scan table rows in batches (10,000 rows)        │
│    │   │   │                                                │
│    │   │   ├─> Read batch from storage                     │
│    │   │   ├─> Serialize to binary format                  │
│    │   │   ├─> Compress if enabled (zstd/lz4)              │
│    │   │   ├─> Write compressed batch to file              │
│    │   │   └─> Update statistics (rows, bytes)             │
│    │   │                                                    │
│    │   └─> Release snapshot                                │
│    │                                                         │
│    ├─> Write footer                                         │
│    │     - CRC32 checksum                                  │
│    │     - Statistics (tables, rows, bytes)                │
│    │                                                         │
│    ├─> Flush and close file                                │
│    ├─> Record dump in history                              │
│    ├─> Clear dirty state (mark LSN as persisted)           │
│    └─> Return DumpReport                                   │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Restore Operation Flow

┌──────────────────────────────────────────────────────────────┐
│ Restore Process (Transactional)                              │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  DumpManager::restore(options)                               │
│    │                                                         │
│    ├─> Open dump file reader                                │
│    ├─> Read and validate metadata                           │
│    │     - Check version compatibility                      │
│    │     - Verify checksum                                  │
│    │                                                         │
│    ├─> Begin global transaction                             │
│    │                                                         │
│    ├─> For each table in dump:                              │
│    │   │                                                    │
│    │   ├─> Read table schema                               │
│    │   ├─> If mode == Clean: DROP TABLE IF EXISTS          │
│    │   ├─> CREATE TABLE with schema                        │
│    │   │                                                    │
│    │   ├─> Read row batches:                               │
│    │   │   │                                                │
│    │   │   ├─> Read compressed batch                       │
│    │   │   ├─> Decompress if needed                        │
│    │   │   ├─> Deserialize rows                            │
│    │   │   ├─> INSERT rows into table                      │
│    │   │   │     - Handle conflicts per options:           │
│    │   │   │       * Error: abort on duplicate             │
│    │   │   │       * Skip: ignore duplicates               │
│    │   │   │       * Update: upsert (ON CONFLICT UPDATE)   │
│    │   │   │                                                │
│    │   │   └─> Update statistics                           │
│    │   │                                                    │
│    │   ├─> Recreate indexes                                │
│    │   └─> Recreate constraints                            │
│    │                                                         │
│    ├─> COMMIT transaction (all-or-nothing)                  │
│    ├─> Update current LSN to dump LSN                       │
│    └─> Return RestoreReport                                 │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Memory Layout and Data Structures

SessionManager Data Structure

pub struct SessionManager {
    /// Map of session_id -> Session
    /// DashMap for lock-free concurrent access
    sessions: Arc<DashMap<SessionId, Arc<RwLock<Session>>>>,

    /// User manager for authentication
    user_manager: Arc<UserManager>,

    /// Lock manager (shared across all sessions)
    lock_manager: Arc<LockManager>,

    /// Resource quota enforcement
    resource_quotas: Arc<ResourceQuotaManager>,

    /// Session timeout in seconds
    session_timeout_secs: u64,

    /// Next session ID (atomic counter)
    next_session_id: Arc<AtomicU64>,
}

// Estimated memory per session: ~2-4 KB
// For 10,000 sessions: ~20-40 MB

LockManager Data Structure

pub struct LockManager {
    /// Lock table: Key -> Vec<LockEntry>
    /// DashMap for lock-free concurrent access
    locks: Arc<DashMap<Key, Vec<LockEntry>>>,

    /// Wait-for graph: TransactionId -> Set<TransactionId>
    wait_for: Arc<DashMap<TransactionId, HashSet<TransactionId>>>,

    /// Lock timeout configuration (milliseconds)
    lock_timeout_ms: u64,
}

pub struct LockEntry {
    pub txn_id: TransactionId,
    pub mode: LockMode,
    pub acquired_at: u64,
}

// Memory per lock entry: ~32 bytes
// For 100,000 active locks: ~3.2 MB

Transaction Data Structure

pub struct Transaction {
    /// Transaction ID
    id: TransactionId,

    /// MVCC snapshot (point-in-time view of database)
    snapshot: Snapshot,

    /// Storage engine reference
    storage: Arc<StorageEngine>,

    /// Lock manager reference
    lock_manager: Arc<LockManager>,

    /// Write buffer (changes not yet committed)
    write_buffer: HashMap<Vec<u8>, Option<Vec<u8>>>,  // None = delete

    /// Read set (for SERIALIZABLE isolation)
    read_set: HashSet<Key>,

    /// Write set (for SERIALIZABLE isolation)
    write_set: HashSet<Key>,

    /// Isolation level
    isolation_level: IsolationLevel,

    /// Transaction start time
    start_time: u64,
}

// Memory per transaction: ~500 KB - 2 MB (depends on write buffer size)

MVCC Snapshot Structure

pub struct Snapshot {
    /// Snapshot transaction ID
    pub txn_id: TransactionId,

    /// Minimum active transaction (xmin)
    pub xmin: TransactionId,

    /// Maximum transaction + 1 (xmax)
    pub xmax: TransactionId,

    /// Set of active transactions at snapshot creation
    pub active_txns: HashSet<TransactionId>,

    /// Frozen flag (immutable for REPEATABLE READ)
    pub frozen: bool,
}

// Memory per snapshot: ~1-2 KB

Performance Optimization Strategies

1. Lock-Free Data Structures

DashMap instead of Mutex<HashMap>:

// ❌ Slow: Global lock contention
let sessions = Arc::new(Mutex::new(HashMap::new()));

// ✅ Fast: Lock-free concurrent hash map
let sessions = Arc::new(DashMap::new());

Benefits:

No global lock contention
Read operations don’t block each other
Write operations lock only specific shards
10-100x faster for highly concurrent workloads

2. Adaptive Lock Acquisition

fn acquire_lock_adaptive(
    txn_id: TransactionId,
    key: Key,
    mode: LockMode
) -> Result<(), Error> {
    let mut backoff = 10;  // Start with 10ms

    for attempt in 0..MAX_RETRIES {
        match try_acquire_lock(txn_id, key.clone(), mode) {
            Ok(true) => return Ok(()),
            Ok(false) => {
                // Exponential backoff: 10ms, 20ms, 40ms, 80ms, ...
                sleep_ms(backoff);
                backoff = std::cmp::min(backoff * 2, 1000);  // Cap at 1 second
            }
            Err(e) => return Err(e),
        }
    }

    Err(Error::LockTimeout)
}

3. Batch Operations

// ❌ Slow: Individual inserts
for row in rows {
    db.execute(&format!("INSERT INTO table VALUES ({}, {})", row.id, row.value))?;
}

// ✅ Fast: Batch insert
db.execute_batch("INSERT INTO table VALUES ($1, $2)", &rows)?;

Benefits:

Amortize lock acquisition overhead
Reduce network roundtrips
Better CPU cache utilization
10-50x faster for bulk operations

4. Early Lock Release

impl Transaction {
    pub fn commit(&mut self) -> Result<(), Error> {
        // Apply writes
        self.apply_writes()?;

        // Release locks ASAP (before cleanup)
        self.lock_manager.release_all_locks(self.id)?;

        // Do expensive cleanup after releasing locks
        self.cleanup()?;

        Ok(())
    }
}

5. Read-Heavy Optimization

For 80%+ read workloads:

// Use MVCC snapshot reads (no locks needed)
let snapshot = snapshot_manager.create_snapshot();
let value = storage.get_at_snapshot(&key, &snapshot)?;

Benefits:

Readers never block writers
Writers never block readers
No lock contention for read-only queries
Scales linearly with cores

6. Memory Pool Allocation

pub struct TransactionPool {
    pool: Vec<Transaction>,
    allocated: Arc<AtomicUsize>,
}

impl TransactionPool {
    pub fn acquire(&self) -> Transaction {
        // Reuse pre-allocated transaction
        self.pool.pop().unwrap_or_else(|| Transaction::new())
    }

    pub fn release(&self, txn: Transaction) {
        // Return to pool for reuse
        if self.pool.len() < POOL_SIZE {
            self.pool.push(txn);
        }
    }
}

Benefits:

Reduce allocation overhead
Better memory locality
Predictable memory usage
2-5x faster transaction creation

Performance Benchmarks

Operation	Throughput	Latency (p99)	Notes
Read-only query	50,000 QPS	0.5ms	No lock contention
Single row update	20,000 TPS	1.2ms	Exclusive lock
Multi-row update	8,000 TPS	5.0ms	Multiple locks
Serializable TXN	3,000 TPS	15ms	Conflict detection
Session create	10,000/sec	0.1ms	DashMap insert
Lock acquire (no conflict)	100,000/sec	0.05ms	Lock-free path
Deadlock detection	50,000/sec	0.2ms	DFS on wait-for graph

Multi-User Transactions Architecture

Multi-User Transactions Architecture

Table of Contents

System Architecture Overview

Component Responsibilities

Session Lifecycle

Session State Machine

Session Creation Flow

Session Data Structure

Transaction Flow with Locking

Transaction Lifecycle

Lock Acquisition Algorithm

Lock Manager Deadlock Detection

Wait-For Graph

Deadlock Detection Algorithm (DFS)

Deadlock Resolution

Isolation Level Implementation

READ COMMITTED

REPEATABLE READ

SERIALIZABLE

Dump/Restore Data Flow

Dump Operation Flow

Restore Operation Flow

Memory Layout and Data Structures

SessionManager Data Structure

LockManager Data Structure

Transaction Data Structure

MVCC Snapshot Structure

Performance Optimization Strategies

1. Lock-Free Data Structures

2. Adaptive Lock Acquisition

3. Batch Operations

4. Early Lock Release

5. Read-Heavy Optimization

6. Memory Pool Allocation

Performance Benchmarks

See Also