Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

tensor_store Benchmarks

The tensor store uses DashMap (sharded concurrent HashMap) for thread-safe key-value storage.

Core Operations

Operation100 items1,000 items10,000 items
put40us (2.5M/s)447us (2.2M/s)7ms (1.4M/s)
get33us (3.0M/s)320us (3.1M/s)3ms (3.3M/s)

Scan Operations (10k total items, parallel)

OperationTime
scan 1k keys191us
scan_count 1k keys41us

Concurrent Write Performance

ThreadsDisjoint KeysHigh Contention (100 keys)
2795us974us
41.59ms1.48ms
84.6ms2.33ms

Mixed Workload

ConfigurationTime
4 readers + 2 writers579us

Analysis

  • Read vs Write: Reads are ~20% faster than writes due to DashMap’s read-optimized design
  • Scaling: Near-linear scaling up to 10k items; slight degradation at scale due to hash table growth
  • Concurrency: DashMap’s 16-shard design provides excellent concurrent performance
  • Contention: Under high contention, performance actually improves at 8 threads vs 4 (lock sharding distributes load)
  • Parallel scans: Uses rayon for >1000 keys (25-53% faster)
  • scan_count vs scan: Count-only is ~5x faster (avoids string cloning)

Bloom Filter (optional)

OperationTime
add68 ns
might_contain (hit)46 ns
might_contain (miss)63 ns

Sparse Lookups (1K keys in store)

Query TypeWithout BloomWith Bloom
Negative lookup52 ns68 ns
Positive lookup45 ns60 ns
Sparse workload (90% miss)52 ns67 ns

Note: Bloom filter adds ~15ns overhead for in-memory DashMap stores. It’s designed for scenarios where the backing store is slower (disk, network, remote database), where the early rejection of non-existent keys avoids expensive I/O.

Snapshot Persistence (bincode)

Operation100 items1,000 items10,000 items
save100 us (1.0M/s)927 us (1.08M/s)12.6 ms (791K/s)
load74 us (1.35M/s)826 us (1.21M/s)10.7 ms (936K/s)
load_with_bloom81 us (1.23M/s)840 us (1.19M/s)11.0 ms (908K/s)

Each item is a TensorData with 3 fields: id (i64), name (String), embedding (128-dim Vec<f32>).

Snapshot File Sizes

ItemsFile SizePer Item
100~60 KB~600 bytes
1,000~600 KB~600 bytes
10,000~6 MB~600 bytes

Snapshot Analysis

  • Throughput: ~1M items/second for both save and load
  • Atomicity: Uses temp file + rename for crash-safe writes
  • Bloom filter overhead: ~3-5% slower to rebuild filter during load
  • Scaling: Near-linear with dataset size
  • File size: ~600 bytes per item with 128-dim embeddings (dominated by vector data)

Write-Ahead Log (WAL)

WAL provides crash-consistent durability with minimal performance overhead. Benchmarks use same payload as in-memory tests (128-dim embeddings).

WAL Writes

RecordsTimeThroughput
100152 us657K ops/s
1,000753 us1.33M ops/s
10,0006.95 ms1.44M ops/s

WAL Recovery

RecordsTimeThroughput
100382 us261K elem/s
1,000394 us2.5M elem/s
10,000391 us25.6M elem/s

WAL Analysis

  • Near constant recovery time: Recovery is dominated by file open overhead (~400us), not record count
  • Sequential I/O: WAL replay reads sequentially, hitting 25M records/sec
  • Durable vs in-memory: WAL writes at 1.4M ops/sec vs 2.0M ops/sec in-memory (72% of in-memory speed)
  • Use case: Production deployments requiring crash consistency

All engines support WAL via open_durable():

#![allow(unused)]
fn main() {
// Durable graph engine
let engine = GraphEngine::open_durable("data/graph.wal", WalConfig::default())?;

// Recovery after crash
let engine = GraphEngine::recover("data/graph.wal", &WalConfig::default(), None)?;
}

Sparse Vectors

SparseVector provides memory-efficient storage for high-sparsity embeddings by storing only non-zero values.

Construction (768d)

SparsityTimeThroughput
50%1.2 us640K/s
90%890 ns870K/s
99%650 ns1.18M/s

Dot Product (768d)

SparsitySparse-SparseSparse-DenseDense-DenseSparse Speedup
50%2.1 us1.8 us580 ns0.3x (slower)
90%380 ns290 ns580 ns1.5-2x
99%38 ns26 ns580 ns15-22x

Memory Compression

DimensionSparsityDense SizeSparse SizeRatio
76890%3,072 B1,024 B3x
76899%3,072 B96 B32x
153699%6,144 B184 B33x

Sparse Vector Analysis

  • High sparsity sweet spot: At 99% sparsity, dot products are 15-22x faster than dense
  • Memory scaling: Compression ratio = 1 / (1 - sparsity), so 99% sparse = ~100x smaller
  • Construction overhead: Negligible (~1us per vector)
  • Use case: Embeddings from sparse models, one-hot encodings, pruned representations

Delta Vectors

DeltaVector stores embeddings as differences from reference “archetype” vectors, ideal for clustered embeddings.

Construction (768d, 5% delta)

DimensionTimeThroughput
1281.9 us526K/s
76812.3 us81K/s
153625.1 us40K/s

Dot Product (768d, precomputed archetype dot)

MethodTimevs Dense
Delta precomputed89 ns6.5x faster
Delta full620 ns~same
Dense baseline580 ns1x

Same-Archetype Dot Product (768d)

MethodTimeSpeedup
Delta-delta145 ns4x
Dense baseline580 ns1x

Delta Memory (768d)

Delta FractionDense SizeDelta SizeRatio
1% diff3,072 B120 B25x
5% diff3,072 B360 B8.5x
10% diff3,072 B680 B4.5x

Archetype Registry (8 archetypes, 768d)

OperationTime
find_best_archetype4.2 us
encode14 us
decode1.1 us

Delta Vector Analysis

  • Precomputed speedup: With archetype dot products cached, 6.5x faster than dense
  • Cluster-friendly: Similar vectors share archetypes, deltas are sparse
  • Use case: Semantic embeddings that cluster (documents, user profiles, products)

K-means Clustering

K-means discovers archetype vectors automatically from embedding collections.

K-means fit (128d, k=5)

VectorsTimeThroughput
10050 us2.0M elem/s
500241 us2.1M elem/s
1000482 us2.1M elem/s

Varying k (1000 vectors, 128d)

kTimeThroughput
2183 us5.5M elem/s
5482 us2.1M elem/s
10984 us1.0M elem/s
2014.5 ms69K elem/s

K-means Analysis

  • K-means++ is faster: Better initial centroids mean fewer iterations to converge
  • Linear with n: Doubling vectors roughly doubles time
  • Quadratic with k at high k: Each iteration is O(n*k), and more clusters need more iterations
  • Use case: Auto-discover archetypes for delta encoding, cluster analysis, centroid-based search