Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

vector_engine Benchmarks

The vector engine stores embeddings and performs k-nearest neighbor search using cosine similarity.

Store Embedding

DimensionTimeThroughput
128366 ns2.7M/s
768892 ns1.1M/s
1536969 ns1.0M/s

Get Embedding

DimensionTime
768287 ns

Delete Embedding

OperationTime
delete806 ns

Similarity Search (top 10, SIMD + adaptive parallel)

DatasetTimePer VectorMode
1,000 x 128d242 us242 nsSequential
1,000 x 768d367 us367 nsSequential
10,000 x 128d1.93 ms193 nsParallel

Cosine Similarity Computation (SIMD-accelerated)

DimensionTime
12826 ns
768165 ns
1536369 ns

Analysis

  • SIMD acceleration: 8-wide f32 SIMD (via wide crate) provides 3-9x speedup for cosine similarity
  • Adaptive parallelism: Uses rayon for parallel search when >5000 vectors (1.6x speedup at 10K)
  • Linear scaling with dimension: Cosine similarity is O(d) where d is vector dimension
  • Linear scaling with dataset size: Brute-force search is O(n*d) for n vectors
  • Memory bound: For 768d vectors, ~3 KB per embedding (768 * 4 bytes)
  • Search throughput: ~4M vector comparisons/second at 128d (with SIMD)
  • Store/Get performance: Sub-microsecond for typical embedding sizes

Complexity

OperationTime ComplexityNotes
store_embeddingO(d)Vector copy + hash insert
get_embeddingO(d)Hash lookup + vector clone
delete_embeddingO(1)Hash removal
search_similarO(n*d)Brute-force scan
compute_similarityO(d)Dot product + 2 magnitude calculations

HNSW Index (Approximate Nearest Neighbor)

HNSW provides O(log n) search complexity instead of O(n) brute force.

ConfigurationSearch Time (5K, 128d)
high_speed~50 us
default~100 us
high_recall~200 us

HNSW vs Brute Force (10K vectors, 128d)

MethodSearch TimeSpeedup
Brute force~2 ms1x
HNSW default~150 us~13x
Corpus SizeApproachRationale
< 10KBrute forceFast enough, pure tensor
10K - 100KHNSWPragmatic, 5-13x faster
> 100KHNSWNecessary for latency

Scaling Projections (HNSW for >10K vectors)

VectorsDimensionSearch Time (est.)
10K768~200 us
100K768~500 us
1M768~1 ms

For production workloads at extreme scale (>1M vectors), consider:

  • Sharded HNSW across multiple nodes
  • Dimensionality reduction (PCA)
  • Quantization (int8, binary)

Storage Model

vector_engine stores each embedding as a tensor:

emb:{key} -> TensorData { vector: [...] }

Trade-offs

  • Pro: Simple storage model, consistent with tensor abstraction
  • Pro: Sub-microsecond store/get operations
  • Pro: HNSW index for O(log n) approximate nearest neighbor search
  • Con: Brute-force O(n*d) for exact search (use HNSW for approximate)