Vector Engine
Module 4 of Neumann. Provides embeddings storage and similarity search with SIMD-accelerated distance computations.
The Vector Engine builds on tensor_store to provide k-NN search capabilities.
It supports both brute-force O(n) search and HNSW O(log n) approximate search,
with automatic sparse vector optimization for memory efficiency.
Design Principles
| Principle | Description |
|---|---|
| Layered Architecture | Depends only on Tensor Store for persistence |
| Multiple Distance Metrics | Cosine, Euclidean, and Dot Product similarity |
| SIMD Acceleration | 8-wide SIMD for dot products and magnitudes |
| Dual Search Modes | Brute-force O(n) or HNSW O(log n) |
| Unified Entities | Embeddings can be attached to shared entities |
| Thread Safety | Inherits from Tensor Store |
| Serializable Types | All types implement serde::Serialize/Deserialize |
| Automatic Sparsity Detection | Vectors with >50% zeros stored efficiently |
Architecture
graph TB
subgraph VectorEngine
VE[VectorEngine]
SR[SearchResult]
DM[DistanceMetric]
VE --> |uses| SR
VE --> |uses| DM
end
subgraph TensorStore
TS[TensorStore]
HNSW[HNSWIndex]
SV[SparseVector]
SIMD[SIMD Functions]
ES[EmbeddingStorage]
end
VE --> |stores to| TS
VE --> |builds| HNSW
VE --> |uses| SV
VE --> |uses| SIMD
subgraph Storage
EMB["emb:{key}"]
ENT["entity:{key}._embedding"]
end
TS --> EMB
TS --> ENT
Key Types
| Type | Description |
|---|---|
VectorEngine | Main engine for storing and searching embeddings |
VectorEngineConfig | Configuration for engine behavior and memory bounds |
SearchResult | Result with key and similarity score |
DistanceMetric | Enum: Cosine, Euclidean, DotProduct |
ExtendedDistanceMetric | Extended metrics for HNSW (9+ variants) |
VectorError | Error types for vector operations |
EmbeddingInput | Input for batch store operations |
BatchResult | Result of batch operations |
Pagination | Parameters for paginated queries |
PagedResult<T> | Paginated query result |
HNSWIndex | Hierarchical navigable small world graph (re-exported from tensor_store) |
HNSWConfig | HNSW index configuration (re-exported from tensor_store) |
SparseVector | Memory-efficient sparse embedding storage |
FilterCondition | Filter for metadata-based search (Eq, Ne, Lt, Gt, And, Or, In, etc.) |
FilterValue | Value type for filters (Int, Float, String, Bool, Null) |
FilterStrategy | Strategy selection (Auto, PreFilter, PostFilter) |
FilteredSearchConfig | Configuration for filtered search behavior |
VectorCollectionConfig | Configuration for vector collections |
MetadataValue | Simplified value type for embedding metadata |
PersistentVectorIndex | Serializable index for disk persistence |
VectorError Variants
| Variant | Description | When Triggered |
|---|---|---|
NotFound | Embedding key doesn’t exist | get_embedding, delete_embedding |
DimensionMismatch | Vectors have different dimensions | compute_similarity, exceeds max_dimension |
EmptyVector | Empty vector provided | Any operation with vec![] |
InvalidTopK | top_k is 0 | search_similar, search_with_hnsw |
StorageError | Underlying Tensor Store error | Storage failures |
BatchValidationError | Invalid input in batch | batch_store_embeddings validation |
BatchOperationError | Operation failed in batch | batch_store_embeddings execution |
ConfigurationError | Invalid configuration | VectorEngineConfig::validate() |
CollectionExists | Collection already exists | create_collection with existing name |
CollectionNotFound | Collection not found | Collection operations on missing collection |
IoError | IO error during persistence | save_to_file, load_from_file |
SerializationError | Serialization error | Index persistence operations |
SearchTimeout | Search operation timed out | Search operations exceeding configured timeout |
Configuration
VectorEngineConfig
Configuration for the Vector Engine with memory bounds and performance tuning.
| Field | Type | Default | Description |
|---|---|---|---|
default_dimension | Option<usize> | None | Expected embedding dimension |
sparse_threshold | f32 | 0.5 | Sparsity threshold (0.0-1.0) |
parallel_threshold | usize | 5000 | Dataset size for parallel search |
default_metric | DistanceMetric | Cosine | Default distance metric |
max_dimension | Option<usize> | None | Maximum allowed dimension |
max_keys_per_scan | Option<usize> | None | Limit for unbounded scans |
batch_parallel_threshold | usize | 100 | Batch size for parallel processing |
search_timeout | Option<Duration> | None | Search operation timeout |
Configuration Presets
| Preset | Description | Key Settings |
|---|---|---|
default() | Balanced for most workloads | All defaults |
high_throughput() | Optimized for write-heavy loads | parallel_threshold: 1000 |
low_memory() | Memory-constrained environments | max_dimension: 4096, max_keys_per_scan: 10000, search_timeout: 30s |
Builder Methods
All builder methods are const fn for compile-time configuration:
#![allow(unused)] fn main() { use std::time::Duration; let config = VectorEngineConfig::default() .with_default_dimension(768) .with_sparse_threshold(0.7) .with_parallel_threshold(1000) .with_default_metric(DistanceMetric::Cosine) .with_max_dimension(4096) .with_max_keys_per_scan(50_000) .with_batch_parallel_threshold(200) .with_search_timeout(Duration::from_secs(5)); let engine = VectorEngine::with_config(config)?; }
Memory Bounds
For production deployments, configure memory bounds to prevent resource exhaustion:
#![allow(unused)] fn main() { // Reject embeddings larger than 4096 dimensions let config = VectorEngineConfig::default() .with_max_dimension(4096) .with_max_keys_per_scan(10_000); let engine = VectorEngine::with_config(config)?; // This will fail with DimensionMismatch engine.store_embedding("too_big", vec![0.0; 5000])?; // Error! }
Search Timeout
Configure a timeout for search operations to prevent runaway queries:
#![allow(unused)] fn main() { use std::time::Duration; use vector_engine::{VectorEngine, VectorEngineConfig, VectorError}; let config = VectorEngineConfig::default() .with_search_timeout(Duration::from_secs(5)); let engine = VectorEngine::with_config(config)?; match engine.search_similar(&query, 10) { Ok(results) => { /* process results */ }, Err(VectorError::SearchTimeout { operation, timeout_ms }) => { eprintln!("Search '{}' timed out after {}ms", operation, timeout_ms); }, Err(e) => { /* handle other errors */ }, } }
The timeout applies to all search methods. When a timeout occurs, no partial results are returned to prevent misleading results that may miss better matches.
Distance Metrics
| Metric | Formula | Score Range | Use Case | HNSW Support |
|---|---|---|---|---|
| Cosine | a.b / (‖a‖ * ‖b‖) | -1.0 to 1.0 | Semantic similarity | Yes |
| Euclidean | 1 / (1 + sqrt(sum((a-b)^2))) | 0.0 to 1.0 | Spatial distance | No (brute-force) |
| DotProduct | sum(a * b) | unbounded | Magnitude-aware | No (brute-force) |
All metrics return higher scores for better matches. Euclidean distance is transformed to similarity score.
Extended Distance Metrics (HNSW)
The ExtendedDistanceMetric enum provides additional metrics for HNSW-based
search via search_with_hnsw_and_metric():
| Metric | Description | Best For |
|---|---|---|
Cosine | Angle-based similarity | Text embeddings, normalized vectors |
Euclidean | L2 distance | Spatial data, absolute distances |
Angular | Cosine converted to angular | When angle interpretation needed |
Manhattan | L1 norm | Robust to outliers |
Chebyshev | L-infinity (max diff) | When max deviation matters |
Jaccard | Set similarity | Binary/sparse vectors, TF-IDF |
Overlap | Minimum overlap coefficient | Partial matches |
Geodesic | Spherical distance | Geographic coordinates |
Composite | Weighted combination | Custom similarity functions |
#![allow(unused)] fn main() { use vector_engine::ExtendedDistanceMetric; let (index, keys) = engine.build_hnsw_index_default()?; // Search with Jaccard similarity for sparse vectors let results = engine.search_with_hnsw_and_metric( &index, &keys, &query, 10, ExtendedDistanceMetric::Jaccard, )?; }
Distance Metric Implementation Details
flowchart TD
Query[Query Vector] --> MetricCheck{Which Metric?}
MetricCheck -->|Cosine| CosMag[Pre-compute query magnitude]
CosMag --> CosDot[SIMD dot product]
CosDot --> CosDiv[Divide by magnitudes]
CosDiv --> CosScore[Score: dot / mag_a * mag_b]
MetricCheck -->|Euclidean| EucDiff[Compute differences]
EucDiff --> EucSum[Sum of squares]
EucSum --> EucSqrt[Square root]
EucSqrt --> EucScore[Score: 1 / 1 + distance]
MetricCheck -->|DotProduct| DotSIMD[SIMD dot product]
DotSIMD --> DotScore[Score: raw dot product]
Cosine Similarity Edge Cases
#![allow(unused)] fn main() { // Zero-magnitude vectors return 0.0 similarity let zero = vec![0.0, 0.0, 0.0]; let normal = vec![1.0, 2.0, 3.0]; VectorEngine::compute_similarity(&zero, &normal)?; // Returns 0.0 // Identical vectors return 1.0 VectorEngine::compute_similarity(&normal, &normal)?; // Returns 1.0 // Opposite vectors return -1.0 let opposite = vec![-1.0, -2.0, -3.0]; VectorEngine::compute_similarity(&normal, &opposite)?; // Returns -1.0 // Orthogonal vectors return 0.0 let a = vec![1.0, 0.0]; let b = vec![0.0, 1.0]; VectorEngine::compute_similarity(&a, &b)?; // Returns 0.0 }
Euclidean Distance Transformation
The engine transforms Euclidean distance to similarity score using 1 / (1 + distance):
| Distance | Similarity Score |
|---|---|
| 0.0 | 1.0 (identical) |
| 1.0 | 0.5 |
| 2.0 | 0.333 |
| 9.0 | 0.1 |
| Infinity | 0.0 |
SIMD Implementation
The Vector Engine uses 8-wide SIMD operations via the wide crate for
accelerated distance computations.
SIMD Dot Product Algorithm
#![allow(unused)] fn main() { // Simplified view of the SIMD implementation pub fn dot_product(a: &[f32], b: &[f32]) -> f32 { let chunks = a.len() / 8; // Process 8 floats at a time let remainder = a.len() % 8; let mut sum = f32x8::ZERO; // Process 8 elements at a time with SIMD for i in 0..chunks { let offset = i * 8; let va = f32x8::from(&a[offset..offset + 8]); let vb = f32x8::from(&b[offset..offset + 8]); sum += va * vb; // Parallel multiply-add } // Sum SIMD lanes + handle remainder scalar let arr: [f32; 8] = sum.into(); let mut result: f32 = arr.iter().sum(); // Handle remainder with scalar operations let start = chunks * 8; for i in 0..remainder { result += a[start + i] * b[start + i]; } result } }
SIMD Performance Characteristics
| Dimension | SIMD Speedup | Notes |
|---|---|---|
| 8 | 1x | Baseline (single SIMD operation) |
| 64 | 4-6x | Full pipeline utilization |
| 384 | 6-8x | Sentence Transformers size |
| 768 | 6-8x | BERT embedding size |
| 1536 | 6-8x | OpenAI ada-002 size |
| 3072 | 6-8x | OpenAI text-embedding-3-large |
SIMD operations are cache-friendly due to sequential memory access patterns.
API Reference
Basic Operations
#![allow(unused)] fn main() { let engine = VectorEngine::new(); // Store an embedding engine.store_embedding("doc1", vec![0.1, 0.2, 0.3])?; // Get an embedding let vector = engine.get_embedding("doc1")?; // Delete an embedding engine.delete_embedding("doc1")?; // Check existence engine.exists("doc1"); // -> bool // Count embeddings engine.count(); // -> usize // List all keys let keys = engine.list_keys(); // Clear all embeddings engine.clear()?; // Get dimension (from first embedding) engine.dimension(); // -> Option<usize> }
Similarity Search
#![allow(unused)] fn main() { // Find top-k most similar (cosine by default) let query = vec![0.1, 0.2, 0.3]; let results = engine.search_similar(&query, 5)?; for result in results { println!("Key: {}, Score: {}", result.key, result.score); } // Search with specific metric let results = engine.search_similar_with_metric( &query, 5, DistanceMetric::Euclidean )?; // Direct similarity computation let similarity = VectorEngine::compute_similarity(&vec_a, &vec_b)?; }
Filtered Search
Search with metadata filters to narrow results without post-processing:
#![allow(unused)] fn main() { use vector_engine::{FilterCondition, FilterValue, FilteredSearchConfig, FilterStrategy}; // Build a filter condition let filter = FilterCondition::Eq("category".to_string(), FilterValue::String("science".to_string())) .and(FilterCondition::Gt("year".to_string(), FilterValue::Int(2020))); // Search with filter (auto strategy) let results = engine.search_similar_filtered(&query, 10, &filter, None)?; // Search with explicit pre-filter strategy (best for selective filters) let config = FilteredSearchConfig::pre_filter(); let results = engine.search_similar_filtered(&query, 10, &filter, Some(config))?; // Search with post-filter and custom oversample let config = FilteredSearchConfig::post_filter().with_oversample(5); let results = engine.search_similar_filtered(&query, 10, &filter, Some(config))?; }
Filter Conditions
| Condition | Description | Example |
|---|---|---|
Eq(field, value) | Equality | category = "science" |
Ne(field, value) | Not equal | status != "deleted" |
Lt(field, value) | Less than | price < 100 |
Le(field, value) | Less than or equal | price <= 100 |
Gt(field, value) | Greater than | year > 2020 |
Ge(field, value) | Greater than or equal | year >= 2020 |
And(a, b) | Logical AND | Combined conditions |
Or(a, b) | Logical OR | Alternative conditions |
In(field, values) | Value in list | status IN ["active", "pending"] |
Contains(field, substr) | String contains | title CONTAINS "rust" |
StartsWith(field, prefix) | String prefix | name STARTS WITH "doc:" |
Exists(field) | Field exists | HAS embedding |
True | Always matches | No filter |
Filter Strategies
| Strategy | When to Use | Behavior |
|---|---|---|
Auto | Default | Estimates selectivity and chooses |
PreFilter | < 10% matches | Filters first, then searches subset |
PostFilter | > 10% matches | Searches with oversample, then filters |
flowchart TD
Query[Query + Filter] --> Strategy{Which Strategy?}
Strategy -->|Auto| Estimate[Estimate Selectivity]
Estimate -->|< 10%| Pre[Pre-Filter]
Estimate -->|>= 10%| Post[Post-Filter]
Strategy -->|PreFilter| Pre
Strategy -->|PostFilter| Post
Pre --> Filter1[Filter all keys]
Filter1 --> Search1[Search filtered subset]
Search1 --> Result[Top-K Results]
Post --> Search2[Search with oversample]
Search2 --> Filter2[Filter candidates]
Filter2 --> Result
Filter Helper Methods
Utilities for working with filters:
#![allow(unused)] fn main() { // Estimate how selective a filter is (0.0 = matches nothing, 1.0 = matches all) let selectivity = engine.estimate_filter_selectivity(&filter); // Count how many embeddings match a filter let matching = engine.count_matching(&filter); // Get keys of all matching embeddings let keys = engine.list_keys_matching(&filter); }
Metadata Storage
Store and retrieve metadata alongside embeddings:
#![allow(unused)] fn main() { use tensor_store::TensorValue; use std::collections::HashMap; // Store embedding with metadata let mut metadata = HashMap::new(); metadata.insert("category".to_string(), TensorValue::from("science")); metadata.insert("year".to_string(), TensorValue::from(2024i64)); metadata.insert("score".to_string(), TensorValue::from(0.95f64)); engine.store_embedding_with_metadata("doc1", vec![0.1, 0.2, 0.3], metadata)?; // Get all metadata let meta = engine.get_metadata("doc1")?; // Get specific field let category = engine.get_metadata_field("doc1", "category")?; // Update metadata (merges with existing) let mut updates = HashMap::new(); updates.insert("score".to_string(), TensorValue::from(0.98f64)); engine.update_metadata("doc1", updates)?; // Check if metadata field exists if engine.has_metadata_field("doc1", "category") { // Remove specific metadata field engine.remove_metadata_field("doc1", "category")?; } }
Batch Operations
For bulk insert and delete operations with parallel processing:
#![allow(unused)] fn main() { use vector_engine::EmbeddingInput; // Batch store - validates all inputs first, then stores in parallel let inputs = vec![ EmbeddingInput::new("doc1", vec![0.1, 0.2, 0.3]), EmbeddingInput::new("doc2", vec![0.2, 0.3, 0.4]), EmbeddingInput::new("doc3", vec![0.3, 0.4, 0.5]), ]; let result = engine.batch_store_embeddings(inputs)?; println!("Stored {} embeddings", result.count); // -> 3 // Batch delete - returns count of successfully deleted let keys = vec!["doc1".to_string(), "doc2".to_string()]; let deleted = engine.batch_delete_embeddings(keys)?; println!("Deleted {} embeddings", deleted); // -> 2 }
Batches larger than batch_parallel_threshold (default: 100) use parallel
processing via rayon.
Pagination
For memory-efficient iteration over large datasets:
#![allow(unused)] fn main() { use vector_engine::Pagination; // List keys with pagination let page = Pagination::new(0, 100); // skip=0, limit=100 let result = engine.list_keys_paginated(page); println!("Items: {}, Has more: {}", result.items.len(), result.has_more); // Get total count with pagination let page = Pagination::new(0, 100).with_total(); let result = engine.list_keys_paginated(page); println!("Total: {:?}", result.total_count); // Some(total) // Paginated similarity search let page = Pagination::new(10, 5); // skip first 10, return 5 let results = engine.search_similar_paginated(&query, 100, page)?; // Paginated entity search let results = engine.search_entities_paginated(&query, 100, page)?; }
Use list_keys_bounded() for production to enforce max_keys_per_scan limits.
Search Flow Diagram
sequenceDiagram
participant Client
participant VE as VectorEngine
participant TS as TensorStore
participant SIMD
Client->>VE: search_similar(query, k)
VE->>VE: Validate query (non-empty, k > 0)
VE->>SIMD: Pre-compute query magnitude
VE->>TS: scan("emb:")
TS-->>VE: List of embedding keys
alt Dataset < 5000 vectors
VE->>VE: Sequential search
else Dataset >= 5000 vectors
VE->>VE: Parallel search (rayon)
end
loop For each embedding
VE->>TS: get(key)
TS-->>VE: TensorData
VE->>VE: Extract vector (dense or sparse)
VE->>SIMD: cosine_similarity(query, stored)
VE->>VE: Collect SearchResult
end
VE->>VE: Sort by score descending
VE->>VE: Truncate to top k
VE-->>Client: Vec<SearchResult>
HNSW Index
For large datasets, build an HNSW index for O(log n) search:
#![allow(unused)] fn main() { // Build index with default config let (index, key_mapping) = engine.build_hnsw_index_default()?; // Search using the index let results = engine.search_with_hnsw(&index, &key_mapping, &query, 10)?; // Build with custom config let config = HNSWConfig::high_recall(); let (index, key_mapping) = engine.build_hnsw_index(config)?; // Direct HNSW operations let index = HNSWIndex::new(); index.insert(vec![1.0, 2.0, 3.0]); let results = index.search(&query, 10); // Search with custom ef (recall/speed tradeoff) let results = index.search_with_ef(&query, 10, 200); }
HNSW Search Flow
flowchart TD
Query[Query Vector] --> Entry[Entry Point at Max Layer]
Entry --> Greedy1[Greedy Search Layer L]
Greedy1 --> |Find closest| Greedy2[Greedy Search Layer L-1]
Greedy2 --> |...|GreedyN[Greedy Search until Layer 1]
GreedyN --> Layer0[Full ef-Search at Layer 0]
Layer0 --> Candidates[Candidate Pool]
Candidates --> |BinaryHeap min-heap| Visit[Visit Neighbors]
Visit --> Distance[Compute Distances]
Distance --> |Update| Results[Result Pool]
Results --> |BinaryHeap max-heap| Prune[Keep top ef]
Prune --> |More candidates?| Visit
Prune --> |Done| TopK[Return Top K]
Unified Entity Mode
Attach embeddings directly to entities for cross-engine queries:
#![allow(unused)] fn main() { let store = TensorStore::new(); let engine = VectorEngine::with_store(store.clone()); // Set embedding on an entity engine.set_entity_embedding("user:1", vec![0.1, 0.2, 0.3])?; // Get embedding from an entity let embedding = engine.get_entity_embedding("user:1")?; // Check if entity has embedding engine.entity_has_embedding("user:1"); // -> bool // Remove embedding (preserves other entity data) engine.remove_entity_embedding("user:1")?; // Search entities with embeddings let results = engine.search_entities(&query, 5)?; // Scan all entities with embeddings let entity_keys = engine.scan_entities_with_embeddings(); // Count entities with embeddings let count = engine.count_entities_with_embeddings(); }
Unified entity embeddings are stored in the _embedding field of the entity’s
TensorData.
Collections
Collections provide isolated namespaces for organizing embeddings by type or purpose. Each collection can have its own dimension constraints and distance metric configuration.
Creating and Managing Collections
#![allow(unused)] fn main() { use vector_engine::{VectorEngine, VectorCollectionConfig, DistanceMetric}; let engine = VectorEngine::new(); // Create collection with custom config let config = VectorCollectionConfig::default() .with_dimension(768) .with_metric(DistanceMetric::Cosine) .with_auto_index(5000); // Auto-build HNSW at 5000 vectors engine.create_collection("documents", config)?; // List collections let collections = engine.list_collections(); // Check if collection exists engine.collection_exists("documents"); // -> true // Get collection config let config = engine.get_collection_config("documents"); // Delete collection (removes all vectors in it) engine.delete_collection("documents")?; }
Storing in Collections
#![allow(unused)] fn main() { use std::collections::HashMap; use tensor_store::TensorValue; // Store vector in collection engine.store_in_collection("documents", "doc1", vec![0.1, 0.2, 0.3])?; // Store with metadata let mut metadata = HashMap::new(); metadata.insert("title".to_string(), TensorValue::from("Introduction to Rust")); metadata.insert("author".to_string(), TensorValue::from("Alice")); engine.store_in_collection_with_metadata( "documents", "doc1", vec![0.1, 0.2, 0.3], metadata )?; }
Searching in Collections
#![allow(unused)] fn main() { use vector_engine::{FilterCondition, FilterValue}; // Basic search in collection let results = engine.search_in_collection("documents", &query, 10)?; // Filtered search in collection let filter = FilterCondition::Eq("author".to_string(), FilterValue::String("Alice".to_string())); let results = engine.search_filtered_in_collection( "documents", &query, 10, &filter, None )?; }
Collection Key Isolation
Collections use prefixed storage keys to ensure isolation:
| Operation | Storage Key Pattern |
|---|---|
| Default embeddings | emb:{key} |
| Collection embeddings | coll:{collection}:emb:{key} |
| Entity embeddings | {entity_key}._embedding |
VectorCollectionConfig
| Field | Type | Default | Description |
|---|---|---|---|
dimension | Option<usize> | None | Enforced dimension (rejects mismatches) |
distance_metric | DistanceMetric | Cosine | Default metric for this collection |
auto_index | bool | false | Auto-build HNSW on threshold |
auto_index_threshold | usize | 1000 | Vector count to trigger auto-index |
Index Persistence
Save and restore vector indices for fast startup:
#![allow(unused)] fn main() { use std::path::Path; // Save all collections to directory (one JSON file per collection) let saved = engine.save_all_indices(Path::new("./vector_index"))?; // Load all indices from directory let loaded = engine.load_all_indices(Path::new("./vector_index"))?; // Save single collection to JSON engine.save_index("documents", Path::new("./documents.json"))?; // Save single collection to compact binary format engine.save_index_binary("documents", Path::new("./documents.bin"))?; // Load single collection from JSON (returns collection name) let collection = engine.load_index(Path::new("./documents.json"))?; // Load single collection from binary let collection = engine.load_index_binary(Path::new("./documents.bin"))?; // Get a snapshot for manual serialization let index: PersistentVectorIndex = engine.snapshot_collection("documents"); }
PersistentVectorIndex Format
| Field | Type | Description |
|---|---|---|
collection | String | Collection name |
config | VectorCollectionConfig | Collection configuration |
vectors | Vec<VectorEntry> | All vectors with metadata |
created_at | u64 | Unix timestamp |
version | u32 | Format version (currently 1) |
Storage Model
| Key Pattern | Content | Use Case |
|---|---|---|
emb:{key} | TensorData with “vector” field | Default collection embeddings |
coll:{collection}:emb:{key} | TensorData with “vector” field | Named collection embeddings |
{entity_key} | TensorData with “_embedding” field | Unified entities |
Automatic Sparse Storage
Vectors with >50% zeros are automatically stored as sparse vectors:
#![allow(unused)] fn main() { // Detection threshold: nnz * 2 <= len (i.e., sparsity >= 50%) fn should_use_sparse(vector: &[f32]) -> bool { let nnz = vector.iter().filter(|&&v| v.abs() > 1e-6).count(); nnz * 2 <= vector.len() } // 97% sparse vector (3 non-zeros in 100 elements) let mut sparse = vec![0.0f32; 100]; sparse[0] = 1.0; sparse[50] = 2.0; sparse[99] = 3.0; // Stored efficiently as SparseVector engine.store_embedding("sparse_doc", sparse)?; // Retrieved as dense for computation let dense = engine.get_embedding("sparse_doc")?; }
Storage Format Comparison
| Format | Memory per Element | Best For |
|---|---|---|
| Dense | 4 bytes | Sparsity < 50% |
| Sparse | 8 bytes per non-zero (4 pos + 4 val) | Sparsity > 50% |
Example: 1000-dim vector with 100 non-zeros:
- Dense: 4000 bytes
- Sparse: 800 bytes (5x compression)
Sparse Vector Operations
Memory Layout
SparseVector {
dimension: usize, // Total vector dimension
positions: Vec<u32>, // Sorted indices of non-zeros
values: Vec<f32>, // Corresponding values
}
Sparse Dot Product Algorithm
#![allow(unused)] fn main() { // O(min(nnz_a, nnz_b)) - only overlapping positions contribute pub fn dot(&self, other: &SparseVector) -> f32 { let mut result = 0.0; let mut i = 0; let mut j = 0; // Merge-sort style traversal while i < self.positions.len() && j < other.positions.len() { match self.positions[i].cmp(&other.positions[j]) { Equal => { result += self.values[i] * other.values[j]; i += 1; j += 1; }, Less => i += 1, Greater => j += 1, } } result } }
Sparse-Dense Dot Product
#![allow(unused)] fn main() { // O(nnz) - only iterate over sparse non-zeros pub fn dot_dense(&self, dense: &[f32]) -> f32 { self.positions.iter() .zip(&self.values) .map(|(&pos, &val)| val * dense[pos as usize]) .sum() } }
Sparse Distance Metrics
| Metric | Complexity | Description |
|---|---|---|
dot | O(min(nnz_a, nnz_b)) | Sparse-sparse dot product |
dot_dense | O(nnz) | Sparse-dense dot product |
cosine_similarity | O(min(nnz_a, nnz_b)) | Angle-based similarity |
euclidean_distance | O(nnz_a + nnz_b) | L2 distance |
manhattan_distance | O(nnz_a + nnz_b) | L1 distance |
jaccard_index | O(min(nnz_a, nnz_b)) | Position overlap |
angular_distance | O(min(nnz_a, nnz_b)) | Arc-cosine |
HNSW Configuration
Configuration Parameters
| Parameter | Default | Description |
|---|---|---|
m | 16 | Max connections per node per layer |
m0 | 32 | Max connections at layer 0 (2*m) |
ef_construction | 200 | Candidates during index building |
ef_search | 50 | Candidates during search |
ml | 1/ln(m) | Level multiplier for layer selection |
sparsity_threshold | 0.5 | Auto-sparse threshold |
max_nodes | 10,000,000 | Capacity limit |
Presets
| Preset | m | m0 | ef_construction | ef_search | Use Case |
|---|---|---|---|---|---|
default() | 16 | 32 | 200 | 50 | Balanced |
high_recall() | 32 | 64 | 400 | 200 | Accuracy over speed |
high_speed() | 8 | 16 | 100 | 20 | Speed over accuracy |
Tuning Guidelines
graph TD
subgraph "Higher m / ef"
A[More connections per node]
B[Better recall]
C[More memory]
D[Slower insert]
end
subgraph "Lower m / ef"
E[Fewer connections]
F[Lower recall]
G[Less memory]
H[Faster insert]
end
A --> B
A --> C
A --> D
E --> F
E --> G
E --> H
Workload-Specific Tuning
| Workload | Recommended Config | Rationale |
|---|---|---|
| RAG/Semantic Search | high_recall() | Accuracy critical |
| Real-time recommendations | high_speed() | Latency critical |
| Batch processing | default() | Balanced |
| Small dataset (<10K) | Brute-force | HNSW overhead not worth it |
| Large dataset (>100K) | default() with higher ef_search | Scale benefits |
Memory vs Recall Tradeoff
| Config | Memory/Node | Recall@10 | Search Time |
|---|---|---|---|
| high_speed | ~128 bytes | ~85% | 0.1ms |
| default | ~256 bytes | ~95% | 0.3ms |
| high_recall | ~512 bytes | ~99% | 1.0ms |
Performance Characteristics
| Operation | Complexity | Notes |
|---|---|---|
store_embedding | O(1) | Single store put |
get_embedding | O(1) | Single store get |
delete_embedding | O(1) | Single store delete |
search_similar | O(n*d) | Brute-force, n=count, d=dimension |
search_with_hnsw | O(log n ef m) | Approximate nearest neighbor |
build_hnsw_index | O(n log n ef_construction * m) | Index construction |
count | O(n) | Scans all embeddings |
list_keys | O(n) | Scans all embeddings |
Parallel Search Threshold
Automatic parallel iteration for datasets >5000 vectors:
#![allow(unused)] fn main() { const PARALLEL_THRESHOLD: usize = 5000; if keys.len() >= PARALLEL_THRESHOLD { // Use rayon parallel iterator keys.par_iter().filter_map(...) } else { // Use sequential iterator keys.iter().filter_map(...) } }
Benchmark Results
| Dataset Size | Brute-Force | With HNSW | Speedup |
|---|---|---|---|
| 200 vectors | 4.17s | 9.3us | 448,000x |
| 1,000 vectors | ~5ms | ~20us | 250x |
| 10,000 vectors | ~50ms | ~50us | 1000x |
| 100,000 vectors | ~500ms | ~100us | 5000x |
Supported Embedding Dimensions
| Model | Dimensions | Recommended Config |
|---|---|---|
| OpenAI text-embedding-ada-002 | 1536 | default |
| OpenAI text-embedding-3-small | 1536 | default |
| OpenAI text-embedding-3-large | 3072 | high_recall |
| BERT base | 768 | default |
| Sentence Transformers | 384-768 | default |
| Cohere embed-v3 | 1024 | default |
| Custom/small | <256 | high_speed |
Edge Cases and Gotchas
Zero-Magnitude Vectors
| Metric | Behavior | Rationale |
|---|---|---|
| Cosine | Returns empty results | Division by zero undefined |
| DotProduct | Returns empty results | Undefined direction |
| Euclidean | Works correctly | Finds vectors closest to origin |
Dimension Mismatch Handling
#![allow(unused)] fn main() { // Mismatched dimensions are silently skipped during search engine.store_embedding("2d", vec![1.0, 0.0])?; engine.store_embedding("3d", vec![1.0, 0.0, 0.0])?; // Search with 2D query only matches 2D vectors let results = engine.search_similar(&[1.0, 0.0], 10)?; assert_eq!(results.len(), 1); // Only "2d" matched }
HNSW Limitations
| Limitation | Details | Workaround |
|---|---|---|
| Only cosine similarity | HNSW uses cosine distance internally | Use brute-force for other metrics |
| No deletion | Cannot remove vectors | Rebuild index |
| Static after build | Index doesn’t update with new vectors | Rebuild periodically |
| Memory overhead | Graph structure adds ~2-4x | Use for large datasets only |
NaN/Infinity Handling
Sparse vector operations sanitize NaN/Inf results:
#![allow(unused)] fn main() { // cosine_similarity returns 0.0 for NaN/Inf if result.is_nan() || result.is_infinite() { 0.0 } else { result.clamp(-1.0, 1.0) } // cosine_distance_dense returns 1.0 (max distance) for NaN/Inf if similarity.is_nan() || similarity.is_infinite() { 1.0 // Maximum distance } else { 1.0 - similarity.clamp(-1.0, 1.0) } }
Best Practices
Memory Optimization
- Use sparse vectors for high-sparsity data: Automatic at >50% zeros
- Batch insert for HNSW: Build index once after all data loaded
- Choose appropriate HNSW config: Don’t over-provision m/ef
- Monitor memory with
HNSWMemoryStats: Track dense vs sparse counts
#![allow(unused)] fn main() { let stats = index.memory_stats(); println!("Dense: {}, Sparse: {}, Total bytes: {}", stats.dense_count, stats.sparse_count, stats.embedding_bytes); }
Search Performance
- Pre-compute query magnitude: Done automatically in search
- Use HNSW for >10K vectors: Brute-force for smaller sets
- Tune ef_search: Higher for recall, lower for speed
- Parallel threshold: Automatic at 5000 vectors
Unified Entity Best Practices
- Use for cross-engine queries: When embeddings relate to graph/relational data
- Entity key conventions: Use prefixes like
user:,doc:,item: - Separate embedding namespace: Use
store_embeddingfor isolated vectors
Dependencies
| Crate | Purpose |
|---|---|
tensor_store | Persistence, SparseVector, HNSWIndex, SIMD |
rayon | Parallel iteration for large datasets |
serde | Serialization of types |
tracing | Instrumentation and observability |
Note: wide (SIMD f32x8 operations) is a transitive dependency via tensor_store.
Related Modules
- Tensor Store - Underlying storage and HNSW implementation
- Query Router - Executes SIMILAR queries using VectorEngine
- Tensor Cache - Uses vector similarity for semantic caching