Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Vector Engine

Module 4 of Neumann. Provides embeddings storage and similarity search with SIMD-accelerated distance computations.

The Vector Engine builds on tensor_store to provide k-NN search capabilities. It supports both brute-force O(n) search and HNSW O(log n) approximate search, with automatic sparse vector optimization for memory efficiency.

Design Principles

PrincipleDescription
Layered ArchitectureDepends only on Tensor Store for persistence
Multiple Distance MetricsCosine, Euclidean, and Dot Product similarity
SIMD Acceleration8-wide SIMD for dot products and magnitudes
Dual Search ModesBrute-force O(n) or HNSW O(log n)
Unified EntitiesEmbeddings can be attached to shared entities
Thread SafetyInherits from Tensor Store
Serializable TypesAll types implement serde::Serialize/Deserialize
Automatic Sparsity DetectionVectors with >50% zeros stored efficiently

Architecture

graph TB
    subgraph VectorEngine
        VE[VectorEngine]
        SR[SearchResult]
        DM[DistanceMetric]
        VE --> |uses| SR
        VE --> |uses| DM
    end

    subgraph TensorStore
        TS[TensorStore]
        HNSW[HNSWIndex]
        SV[SparseVector]
        SIMD[SIMD Functions]
        ES[EmbeddingStorage]
    end

    VE --> |stores to| TS
    VE --> |builds| HNSW
    VE --> |uses| SV
    VE --> |uses| SIMD

    subgraph Storage
        EMB["emb:{key}"]
        ENT["entity:{key}._embedding"]
    end

    TS --> EMB
    TS --> ENT

Key Types

TypeDescription
VectorEngineMain engine for storing and searching embeddings
VectorEngineConfigConfiguration for engine behavior and memory bounds
SearchResultResult with key and similarity score
DistanceMetricEnum: Cosine, Euclidean, DotProduct
ExtendedDistanceMetricExtended metrics for HNSW (9+ variants)
VectorErrorError types for vector operations
EmbeddingInputInput for batch store operations
BatchResultResult of batch operations
PaginationParameters for paginated queries
PagedResult<T>Paginated query result
HNSWIndexHierarchical navigable small world graph (re-exported from tensor_store)
HNSWConfigHNSW index configuration (re-exported from tensor_store)
SparseVectorMemory-efficient sparse embedding storage
FilterConditionFilter for metadata-based search (Eq, Ne, Lt, Gt, And, Or, In, etc.)
FilterValueValue type for filters (Int, Float, String, Bool, Null)
FilterStrategyStrategy selection (Auto, PreFilter, PostFilter)
FilteredSearchConfigConfiguration for filtered search behavior
VectorCollectionConfigConfiguration for vector collections
MetadataValueSimplified value type for embedding metadata
PersistentVectorIndexSerializable index for disk persistence

VectorError Variants

VariantDescriptionWhen Triggered
NotFoundEmbedding key doesn’t existget_embedding, delete_embedding
DimensionMismatchVectors have different dimensionscompute_similarity, exceeds max_dimension
EmptyVectorEmpty vector providedAny operation with vec![]
InvalidTopKtop_k is 0search_similar, search_with_hnsw
StorageErrorUnderlying Tensor Store errorStorage failures
BatchValidationErrorInvalid input in batchbatch_store_embeddings validation
BatchOperationErrorOperation failed in batchbatch_store_embeddings execution
ConfigurationErrorInvalid configurationVectorEngineConfig::validate()
CollectionExistsCollection already existscreate_collection with existing name
CollectionNotFoundCollection not foundCollection operations on missing collection
IoErrorIO error during persistencesave_to_file, load_from_file
SerializationErrorSerialization errorIndex persistence operations
SearchTimeoutSearch operation timed outSearch operations exceeding configured timeout

Configuration

VectorEngineConfig

Configuration for the Vector Engine with memory bounds and performance tuning.

FieldTypeDefaultDescription
default_dimensionOption<usize>NoneExpected embedding dimension
sparse_thresholdf320.5Sparsity threshold (0.0-1.0)
parallel_thresholdusize5000Dataset size for parallel search
default_metricDistanceMetricCosineDefault distance metric
max_dimensionOption<usize>NoneMaximum allowed dimension
max_keys_per_scanOption<usize>NoneLimit for unbounded scans
batch_parallel_thresholdusize100Batch size for parallel processing
search_timeoutOption<Duration>NoneSearch operation timeout

Configuration Presets

PresetDescriptionKey Settings
default()Balanced for most workloadsAll defaults
high_throughput()Optimized for write-heavy loadsparallel_threshold: 1000
low_memory()Memory-constrained environmentsmax_dimension: 4096, max_keys_per_scan: 10000, search_timeout: 30s

Builder Methods

All builder methods are const fn for compile-time configuration:

#![allow(unused)]
fn main() {
use std::time::Duration;

let config = VectorEngineConfig::default()
    .with_default_dimension(768)
    .with_sparse_threshold(0.7)
    .with_parallel_threshold(1000)
    .with_default_metric(DistanceMetric::Cosine)
    .with_max_dimension(4096)
    .with_max_keys_per_scan(50_000)
    .with_batch_parallel_threshold(200)
    .with_search_timeout(Duration::from_secs(5));

let engine = VectorEngine::with_config(config)?;
}

Memory Bounds

For production deployments, configure memory bounds to prevent resource exhaustion:

#![allow(unused)]
fn main() {
// Reject embeddings larger than 4096 dimensions
let config = VectorEngineConfig::default()
    .with_max_dimension(4096)
    .with_max_keys_per_scan(10_000);

let engine = VectorEngine::with_config(config)?;

// This will fail with DimensionMismatch
engine.store_embedding("too_big", vec![0.0; 5000])?; // Error!
}

Search Timeout

Configure a timeout for search operations to prevent runaway queries:

#![allow(unused)]
fn main() {
use std::time::Duration;
use vector_engine::{VectorEngine, VectorEngineConfig, VectorError};

let config = VectorEngineConfig::default()
    .with_search_timeout(Duration::from_secs(5));

let engine = VectorEngine::with_config(config)?;

match engine.search_similar(&query, 10) {
    Ok(results) => { /* process results */ },
    Err(VectorError::SearchTimeout { operation, timeout_ms }) => {
        eprintln!("Search '{}' timed out after {}ms", operation, timeout_ms);
    },
    Err(e) => { /* handle other errors */ },
}
}

The timeout applies to all search methods. When a timeout occurs, no partial results are returned to prevent misleading results that may miss better matches.

Distance Metrics

MetricFormulaScore RangeUse CaseHNSW Support
Cosinea.b / (‖a‖ * ‖b‖)-1.0 to 1.0Semantic similarityYes
Euclidean1 / (1 + sqrt(sum((a-b)^2)))0.0 to 1.0Spatial distanceNo (brute-force)
DotProductsum(a * b)unboundedMagnitude-awareNo (brute-force)

All metrics return higher scores for better matches. Euclidean distance is transformed to similarity score.

Extended Distance Metrics (HNSW)

The ExtendedDistanceMetric enum provides additional metrics for HNSW-based search via search_with_hnsw_and_metric():

MetricDescriptionBest For
CosineAngle-based similarityText embeddings, normalized vectors
EuclideanL2 distanceSpatial data, absolute distances
AngularCosine converted to angularWhen angle interpretation needed
ManhattanL1 normRobust to outliers
ChebyshevL-infinity (max diff)When max deviation matters
JaccardSet similarityBinary/sparse vectors, TF-IDF
OverlapMinimum overlap coefficientPartial matches
GeodesicSpherical distanceGeographic coordinates
CompositeWeighted combinationCustom similarity functions
#![allow(unused)]
fn main() {
use vector_engine::ExtendedDistanceMetric;

let (index, keys) = engine.build_hnsw_index_default()?;

// Search with Jaccard similarity for sparse vectors
let results = engine.search_with_hnsw_and_metric(
    &index,
    &keys,
    &query,
    10,
    ExtendedDistanceMetric::Jaccard,
)?;
}

Distance Metric Implementation Details

flowchart TD
    Query[Query Vector] --> MetricCheck{Which Metric?}

    MetricCheck -->|Cosine| CosMag[Pre-compute query magnitude]
    CosMag --> CosDot[SIMD dot product]
    CosDot --> CosDiv[Divide by magnitudes]
    CosDiv --> CosScore[Score: dot / mag_a * mag_b]

    MetricCheck -->|Euclidean| EucDiff[Compute differences]
    EucDiff --> EucSum[Sum of squares]
    EucSum --> EucSqrt[Square root]
    EucSqrt --> EucScore[Score: 1 / 1 + distance]

    MetricCheck -->|DotProduct| DotSIMD[SIMD dot product]
    DotSIMD --> DotScore[Score: raw dot product]

Cosine Similarity Edge Cases

#![allow(unused)]
fn main() {
// Zero-magnitude vectors return 0.0 similarity
let zero = vec![0.0, 0.0, 0.0];
let normal = vec![1.0, 2.0, 3.0];
VectorEngine::compute_similarity(&zero, &normal)?; // Returns 0.0

// Identical vectors return 1.0
VectorEngine::compute_similarity(&normal, &normal)?; // Returns 1.0

// Opposite vectors return -1.0
let opposite = vec![-1.0, -2.0, -3.0];
VectorEngine::compute_similarity(&normal, &opposite)?; // Returns -1.0

// Orthogonal vectors return 0.0
let a = vec![1.0, 0.0];
let b = vec![0.0, 1.0];
VectorEngine::compute_similarity(&a, &b)?; // Returns 0.0
}

Euclidean Distance Transformation

The engine transforms Euclidean distance to similarity score using 1 / (1 + distance):

DistanceSimilarity Score
0.01.0 (identical)
1.00.5
2.00.333
9.00.1
Infinity0.0

SIMD Implementation

The Vector Engine uses 8-wide SIMD operations via the wide crate for accelerated distance computations.

SIMD Dot Product Algorithm

#![allow(unused)]
fn main() {
// Simplified view of the SIMD implementation
pub fn dot_product(a: &[f32], b: &[f32]) -> f32 {
    let chunks = a.len() / 8;        // Process 8 floats at a time
    let remainder = a.len() % 8;

    let mut sum = f32x8::ZERO;

    // Process 8 elements at a time with SIMD
    for i in 0..chunks {
        let offset = i * 8;
        let va = f32x8::from(&a[offset..offset + 8]);
        let vb = f32x8::from(&b[offset..offset + 8]);
        sum += va * vb;  // Parallel multiply-add
    }

    // Sum SIMD lanes + handle remainder scalar
    let arr: [f32; 8] = sum.into();
    let mut result: f32 = arr.iter().sum();

    // Handle remainder with scalar operations
    let start = chunks * 8;
    for i in 0..remainder {
        result += a[start + i] * b[start + i];
    }

    result
}
}

SIMD Performance Characteristics

DimensionSIMD SpeedupNotes
81xBaseline (single SIMD operation)
644-6xFull pipeline utilization
3846-8xSentence Transformers size
7686-8xBERT embedding size
15366-8xOpenAI ada-002 size
30726-8xOpenAI text-embedding-3-large

SIMD operations are cache-friendly due to sequential memory access patterns.

API Reference

Basic Operations

#![allow(unused)]
fn main() {
let engine = VectorEngine::new();

// Store an embedding
engine.store_embedding("doc1", vec![0.1, 0.2, 0.3])?;

// Get an embedding
let vector = engine.get_embedding("doc1")?;

// Delete an embedding
engine.delete_embedding("doc1")?;

// Check existence
engine.exists("doc1");  // -> bool

// Count embeddings
engine.count();  // -> usize

// List all keys
let keys = engine.list_keys();

// Clear all embeddings
engine.clear()?;

// Get dimension (from first embedding)
engine.dimension();  // -> Option<usize>
}
#![allow(unused)]
fn main() {
// Find top-k most similar (cosine by default)
let query = vec![0.1, 0.2, 0.3];
let results = engine.search_similar(&query, 5)?;

for result in results {
    println!("Key: {}, Score: {}", result.key, result.score);
}

// Search with specific metric
let results = engine.search_similar_with_metric(
    &query,
    5,
    DistanceMetric::Euclidean
)?;

// Direct similarity computation
let similarity = VectorEngine::compute_similarity(&vec_a, &vec_b)?;
}

Search with metadata filters to narrow results without post-processing:

#![allow(unused)]
fn main() {
use vector_engine::{FilterCondition, FilterValue, FilteredSearchConfig, FilterStrategy};

// Build a filter condition
let filter = FilterCondition::Eq("category".to_string(), FilterValue::String("science".to_string()))
    .and(FilterCondition::Gt("year".to_string(), FilterValue::Int(2020)));

// Search with filter (auto strategy)
let results = engine.search_similar_filtered(&query, 10, &filter, None)?;

// Search with explicit pre-filter strategy (best for selective filters)
let config = FilteredSearchConfig::pre_filter();
let results = engine.search_similar_filtered(&query, 10, &filter, Some(config))?;

// Search with post-filter and custom oversample
let config = FilteredSearchConfig::post_filter().with_oversample(5);
let results = engine.search_similar_filtered(&query, 10, &filter, Some(config))?;
}

Filter Conditions

ConditionDescriptionExample
Eq(field, value)Equalitycategory = "science"
Ne(field, value)Not equalstatus != "deleted"
Lt(field, value)Less thanprice < 100
Le(field, value)Less than or equalprice <= 100
Gt(field, value)Greater thanyear > 2020
Ge(field, value)Greater than or equalyear >= 2020
And(a, b)Logical ANDCombined conditions
Or(a, b)Logical ORAlternative conditions
In(field, values)Value in liststatus IN ["active", "pending"]
Contains(field, substr)String containstitle CONTAINS "rust"
StartsWith(field, prefix)String prefixname STARTS WITH "doc:"
Exists(field)Field existsHAS embedding
TrueAlways matchesNo filter

Filter Strategies

StrategyWhen to UseBehavior
AutoDefaultEstimates selectivity and chooses
PreFilter< 10% matchesFilters first, then searches subset
PostFilter> 10% matchesSearches with oversample, then filters
flowchart TD
    Query[Query + Filter] --> Strategy{Which Strategy?}

    Strategy -->|Auto| Estimate[Estimate Selectivity]
    Estimate -->|< 10%| Pre[Pre-Filter]
    Estimate -->|>= 10%| Post[Post-Filter]

    Strategy -->|PreFilter| Pre
    Strategy -->|PostFilter| Post

    Pre --> Filter1[Filter all keys]
    Filter1 --> Search1[Search filtered subset]
    Search1 --> Result[Top-K Results]

    Post --> Search2[Search with oversample]
    Search2 --> Filter2[Filter candidates]
    Filter2 --> Result

Filter Helper Methods

Utilities for working with filters:

#![allow(unused)]
fn main() {
// Estimate how selective a filter is (0.0 = matches nothing, 1.0 = matches all)
let selectivity = engine.estimate_filter_selectivity(&filter);

// Count how many embeddings match a filter
let matching = engine.count_matching(&filter);

// Get keys of all matching embeddings
let keys = engine.list_keys_matching(&filter);
}

Metadata Storage

Store and retrieve metadata alongside embeddings:

#![allow(unused)]
fn main() {
use tensor_store::TensorValue;
use std::collections::HashMap;

// Store embedding with metadata
let mut metadata = HashMap::new();
metadata.insert("category".to_string(), TensorValue::from("science"));
metadata.insert("year".to_string(), TensorValue::from(2024i64));
metadata.insert("score".to_string(), TensorValue::from(0.95f64));

engine.store_embedding_with_metadata("doc1", vec![0.1, 0.2, 0.3], metadata)?;

// Get all metadata
let meta = engine.get_metadata("doc1")?;

// Get specific field
let category = engine.get_metadata_field("doc1", "category")?;

// Update metadata (merges with existing)
let mut updates = HashMap::new();
updates.insert("score".to_string(), TensorValue::from(0.98f64));
engine.update_metadata("doc1", updates)?;

// Check if metadata field exists
if engine.has_metadata_field("doc1", "category") {
    // Remove specific metadata field
    engine.remove_metadata_field("doc1", "category")?;
}
}

Batch Operations

For bulk insert and delete operations with parallel processing:

#![allow(unused)]
fn main() {
use vector_engine::EmbeddingInput;

// Batch store - validates all inputs first, then stores in parallel
let inputs = vec![
    EmbeddingInput::new("doc1", vec![0.1, 0.2, 0.3]),
    EmbeddingInput::new("doc2", vec![0.2, 0.3, 0.4]),
    EmbeddingInput::new("doc3", vec![0.3, 0.4, 0.5]),
];

let result = engine.batch_store_embeddings(inputs)?;
println!("Stored {} embeddings", result.count);  // -> 3

// Batch delete - returns count of successfully deleted
let keys = vec!["doc1".to_string(), "doc2".to_string()];
let deleted = engine.batch_delete_embeddings(keys)?;
println!("Deleted {} embeddings", deleted);  // -> 2
}

Batches larger than batch_parallel_threshold (default: 100) use parallel processing via rayon.

Pagination

For memory-efficient iteration over large datasets:

#![allow(unused)]
fn main() {
use vector_engine::Pagination;

// List keys with pagination
let page = Pagination::new(0, 100);  // skip=0, limit=100
let result = engine.list_keys_paginated(page);
println!("Items: {}, Has more: {}", result.items.len(), result.has_more);

// Get total count with pagination
let page = Pagination::new(0, 100).with_total();
let result = engine.list_keys_paginated(page);
println!("Total: {:?}", result.total_count);  // Some(total)

// Paginated similarity search
let page = Pagination::new(10, 5);  // skip first 10, return 5
let results = engine.search_similar_paginated(&query, 100, page)?;

// Paginated entity search
let results = engine.search_entities_paginated(&query, 100, page)?;
}

Use list_keys_bounded() for production to enforce max_keys_per_scan limits.

Search Flow Diagram

sequenceDiagram
    participant Client
    participant VE as VectorEngine
    participant TS as TensorStore
    participant SIMD

    Client->>VE: search_similar(query, k)
    VE->>VE: Validate query (non-empty, k > 0)
    VE->>SIMD: Pre-compute query magnitude
    VE->>TS: scan("emb:")
    TS-->>VE: List of embedding keys

    alt Dataset < 5000 vectors
        VE->>VE: Sequential search
    else Dataset >= 5000 vectors
        VE->>VE: Parallel search (rayon)
    end

    loop For each embedding
        VE->>TS: get(key)
        TS-->>VE: TensorData
        VE->>VE: Extract vector (dense or sparse)
        VE->>SIMD: cosine_similarity(query, stored)
        VE->>VE: Collect SearchResult
    end

    VE->>VE: Sort by score descending
    VE->>VE: Truncate to top k
    VE-->>Client: Vec<SearchResult>

HNSW Index

For large datasets, build an HNSW index for O(log n) search:

#![allow(unused)]
fn main() {
// Build index with default config
let (index, key_mapping) = engine.build_hnsw_index_default()?;

// Search using the index
let results = engine.search_with_hnsw(&index, &key_mapping, &query, 10)?;

// Build with custom config
let config = HNSWConfig::high_recall();
let (index, key_mapping) = engine.build_hnsw_index(config)?;

// Direct HNSW operations
let index = HNSWIndex::new();
index.insert(vec![1.0, 2.0, 3.0]);
let results = index.search(&query, 10);

// Search with custom ef (recall/speed tradeoff)
let results = index.search_with_ef(&query, 10, 200);
}

HNSW Search Flow

flowchart TD
    Query[Query Vector] --> Entry[Entry Point at Max Layer]

    Entry --> Greedy1[Greedy Search Layer L]
    Greedy1 --> |Find closest| Greedy2[Greedy Search Layer L-1]
    Greedy2 --> |...|GreedyN[Greedy Search until Layer 1]

    GreedyN --> Layer0[Full ef-Search at Layer 0]

    Layer0 --> Candidates[Candidate Pool]
    Candidates --> |BinaryHeap min-heap| Visit[Visit Neighbors]
    Visit --> Distance[Compute Distances]
    Distance --> |Update| Results[Result Pool]
    Results --> |BinaryHeap max-heap| Prune[Keep top ef]

    Prune --> |More candidates?| Visit
    Prune --> |Done| TopK[Return Top K]

Unified Entity Mode

Attach embeddings directly to entities for cross-engine queries:

#![allow(unused)]
fn main() {
let store = TensorStore::new();
let engine = VectorEngine::with_store(store.clone());

// Set embedding on an entity
engine.set_entity_embedding("user:1", vec![0.1, 0.2, 0.3])?;

// Get embedding from an entity
let embedding = engine.get_entity_embedding("user:1")?;

// Check if entity has embedding
engine.entity_has_embedding("user:1");  // -> bool

// Remove embedding (preserves other entity data)
engine.remove_entity_embedding("user:1")?;

// Search entities with embeddings
let results = engine.search_entities(&query, 5)?;

// Scan all entities with embeddings
let entity_keys = engine.scan_entities_with_embeddings();

// Count entities with embeddings
let count = engine.count_entities_with_embeddings();
}

Unified entity embeddings are stored in the _embedding field of the entity’s TensorData.

Collections

Collections provide isolated namespaces for organizing embeddings by type or purpose. Each collection can have its own dimension constraints and distance metric configuration.

Creating and Managing Collections

#![allow(unused)]
fn main() {
use vector_engine::{VectorEngine, VectorCollectionConfig, DistanceMetric};

let engine = VectorEngine::new();

// Create collection with custom config
let config = VectorCollectionConfig::default()
    .with_dimension(768)
    .with_metric(DistanceMetric::Cosine)
    .with_auto_index(5000);  // Auto-build HNSW at 5000 vectors

engine.create_collection("documents", config)?;

// List collections
let collections = engine.list_collections();

// Check if collection exists
engine.collection_exists("documents");  // -> true

// Get collection config
let config = engine.get_collection_config("documents");

// Delete collection (removes all vectors in it)
engine.delete_collection("documents")?;
}

Storing in Collections

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use tensor_store::TensorValue;

// Store vector in collection
engine.store_in_collection("documents", "doc1", vec![0.1, 0.2, 0.3])?;

// Store with metadata
let mut metadata = HashMap::new();
metadata.insert("title".to_string(), TensorValue::from("Introduction to Rust"));
metadata.insert("author".to_string(), TensorValue::from("Alice"));

engine.store_in_collection_with_metadata(
    "documents",
    "doc1",
    vec![0.1, 0.2, 0.3],
    metadata
)?;
}

Searching in Collections

#![allow(unused)]
fn main() {
use vector_engine::{FilterCondition, FilterValue};

// Basic search in collection
let results = engine.search_in_collection("documents", &query, 10)?;

// Filtered search in collection
let filter = FilterCondition::Eq("author".to_string(), FilterValue::String("Alice".to_string()));
let results = engine.search_filtered_in_collection(
    "documents",
    &query,
    10,
    &filter,
    None
)?;
}

Collection Key Isolation

Collections use prefixed storage keys to ensure isolation:

OperationStorage Key Pattern
Default embeddingsemb:{key}
Collection embeddingscoll:{collection}:emb:{key}
Entity embeddings{entity_key}._embedding

VectorCollectionConfig

FieldTypeDefaultDescription
dimensionOption<usize>NoneEnforced dimension (rejects mismatches)
distance_metricDistanceMetricCosineDefault metric for this collection
auto_indexboolfalseAuto-build HNSW on threshold
auto_index_thresholdusize1000Vector count to trigger auto-index

Index Persistence

Save and restore vector indices for fast startup:

#![allow(unused)]
fn main() {
use std::path::Path;

// Save all collections to directory (one JSON file per collection)
let saved = engine.save_all_indices(Path::new("./vector_index"))?;

// Load all indices from directory
let loaded = engine.load_all_indices(Path::new("./vector_index"))?;

// Save single collection to JSON
engine.save_index("documents", Path::new("./documents.json"))?;

// Save single collection to compact binary format
engine.save_index_binary("documents", Path::new("./documents.bin"))?;

// Load single collection from JSON (returns collection name)
let collection = engine.load_index(Path::new("./documents.json"))?;

// Load single collection from binary
let collection = engine.load_index_binary(Path::new("./documents.bin"))?;

// Get a snapshot for manual serialization
let index: PersistentVectorIndex = engine.snapshot_collection("documents");
}

PersistentVectorIndex Format

FieldTypeDescription
collectionStringCollection name
configVectorCollectionConfigCollection configuration
vectorsVec<VectorEntry>All vectors with metadata
created_atu64Unix timestamp
versionu32Format version (currently 1)

Storage Model

Key PatternContentUse Case
emb:{key}TensorData with “vector” fieldDefault collection embeddings
coll:{collection}:emb:{key}TensorData with “vector” fieldNamed collection embeddings
{entity_key}TensorData with “_embedding” fieldUnified entities

Automatic Sparse Storage

Vectors with >50% zeros are automatically stored as sparse vectors:

#![allow(unused)]
fn main() {
// Detection threshold: nnz * 2 <= len (i.e., sparsity >= 50%)
fn should_use_sparse(vector: &[f32]) -> bool {
    let nnz = vector.iter().filter(|&&v| v.abs() > 1e-6).count();
    nnz * 2 <= vector.len()
}

// 97% sparse vector (3 non-zeros in 100 elements)
let mut sparse = vec![0.0f32; 100];
sparse[0] = 1.0;
sparse[50] = 2.0;
sparse[99] = 3.0;

// Stored efficiently as SparseVector
engine.store_embedding("sparse_doc", sparse)?;

// Retrieved as dense for computation
let dense = engine.get_embedding("sparse_doc")?;
}

Storage Format Comparison

FormatMemory per ElementBest For
Dense4 bytesSparsity < 50%
Sparse8 bytes per non-zero (4 pos + 4 val)Sparsity > 50%

Example: 1000-dim vector with 100 non-zeros:

  • Dense: 4000 bytes
  • Sparse: 800 bytes (5x compression)

Sparse Vector Operations

Memory Layout

SparseVector {
    dimension: usize,        // Total vector dimension
    positions: Vec<u32>,     // Sorted indices of non-zeros
    values: Vec<f32>,        // Corresponding values
}

Sparse Dot Product Algorithm

#![allow(unused)]
fn main() {
// O(min(nnz_a, nnz_b)) - only overlapping positions contribute
pub fn dot(&self, other: &SparseVector) -> f32 {
    let mut result = 0.0;
    let mut i = 0;
    let mut j = 0;

    // Merge-sort style traversal
    while i < self.positions.len() && j < other.positions.len() {
        match self.positions[i].cmp(&other.positions[j]) {
            Equal => {
                result += self.values[i] * other.values[j];
                i += 1; j += 1;
            },
            Less => i += 1,
            Greater => j += 1,
        }
    }
    result
}
}

Sparse-Dense Dot Product

#![allow(unused)]
fn main() {
// O(nnz) - only iterate over sparse non-zeros
pub fn dot_dense(&self, dense: &[f32]) -> f32 {
    self.positions.iter()
        .zip(&self.values)
        .map(|(&pos, &val)| val * dense[pos as usize])
        .sum()
}
}

Sparse Distance Metrics

MetricComplexityDescription
dotO(min(nnz_a, nnz_b))Sparse-sparse dot product
dot_denseO(nnz)Sparse-dense dot product
cosine_similarityO(min(nnz_a, nnz_b))Angle-based similarity
euclidean_distanceO(nnz_a + nnz_b)L2 distance
manhattan_distanceO(nnz_a + nnz_b)L1 distance
jaccard_indexO(min(nnz_a, nnz_b))Position overlap
angular_distanceO(min(nnz_a, nnz_b))Arc-cosine

HNSW Configuration

Configuration Parameters

ParameterDefaultDescription
m16Max connections per node per layer
m032Max connections at layer 0 (2*m)
ef_construction200Candidates during index building
ef_search50Candidates during search
ml1/ln(m)Level multiplier for layer selection
sparsity_threshold0.5Auto-sparse threshold
max_nodes10,000,000Capacity limit

Presets

Presetmm0ef_constructionef_searchUse Case
default()163220050Balanced
high_recall()3264400200Accuracy over speed
high_speed()81610020Speed over accuracy

Tuning Guidelines

graph TD
    subgraph "Higher m / ef"
        A[More connections per node]
        B[Better recall]
        C[More memory]
        D[Slower insert]
    end

    subgraph "Lower m / ef"
        E[Fewer connections]
        F[Lower recall]
        G[Less memory]
        H[Faster insert]
    end

    A --> B
    A --> C
    A --> D

    E --> F
    E --> G
    E --> H

Workload-Specific Tuning

WorkloadRecommended ConfigRationale
RAG/Semantic Searchhigh_recall()Accuracy critical
Real-time recommendationshigh_speed()Latency critical
Batch processingdefault()Balanced
Small dataset (<10K)Brute-forceHNSW overhead not worth it
Large dataset (>100K)default() with higher ef_searchScale benefits

Memory vs Recall Tradeoff

ConfigMemory/NodeRecall@10Search Time
high_speed~128 bytes~85%0.1ms
default~256 bytes~95%0.3ms
high_recall~512 bytes~99%1.0ms

Performance Characteristics

OperationComplexityNotes
store_embeddingO(1)Single store put
get_embeddingO(1)Single store get
delete_embeddingO(1)Single store delete
search_similarO(n*d)Brute-force, n=count, d=dimension
search_with_hnswO(log n ef m)Approximate nearest neighbor
build_hnsw_indexO(n log n ef_construction * m)Index construction
countO(n)Scans all embeddings
list_keysO(n)Scans all embeddings

Parallel Search Threshold

Automatic parallel iteration for datasets >5000 vectors:

#![allow(unused)]
fn main() {
const PARALLEL_THRESHOLD: usize = 5000;

if keys.len() >= PARALLEL_THRESHOLD {
    // Use rayon parallel iterator
    keys.par_iter().filter_map(...)
} else {
    // Use sequential iterator
    keys.iter().filter_map(...)
}
}

Benchmark Results

Dataset SizeBrute-ForceWith HNSWSpeedup
200 vectors4.17s9.3us448,000x
1,000 vectors~5ms~20us250x
10,000 vectors~50ms~50us1000x
100,000 vectors~500ms~100us5000x

Supported Embedding Dimensions

ModelDimensionsRecommended Config
OpenAI text-embedding-ada-0021536default
OpenAI text-embedding-3-small1536default
OpenAI text-embedding-3-large3072high_recall
BERT base768default
Sentence Transformers384-768default
Cohere embed-v31024default
Custom/small<256high_speed

Edge Cases and Gotchas

Zero-Magnitude Vectors

MetricBehaviorRationale
CosineReturns empty resultsDivision by zero undefined
DotProductReturns empty resultsUndefined direction
EuclideanWorks correctlyFinds vectors closest to origin

Dimension Mismatch Handling

#![allow(unused)]
fn main() {
// Mismatched dimensions are silently skipped during search
engine.store_embedding("2d", vec![1.0, 0.0])?;
engine.store_embedding("3d", vec![1.0, 0.0, 0.0])?;

// Search with 2D query only matches 2D vectors
let results = engine.search_similar(&[1.0, 0.0], 10)?;
assert_eq!(results.len(), 1);  // Only "2d" matched
}

HNSW Limitations

LimitationDetailsWorkaround
Only cosine similarityHNSW uses cosine distance internallyUse brute-force for other metrics
No deletionCannot remove vectorsRebuild index
Static after buildIndex doesn’t update with new vectorsRebuild periodically
Memory overheadGraph structure adds ~2-4xUse for large datasets only

NaN/Infinity Handling

Sparse vector operations sanitize NaN/Inf results:

#![allow(unused)]
fn main() {
// cosine_similarity returns 0.0 for NaN/Inf
if result.is_nan() || result.is_infinite() {
    0.0
} else {
    result.clamp(-1.0, 1.0)
}

// cosine_distance_dense returns 1.0 (max distance) for NaN/Inf
if similarity.is_nan() || similarity.is_infinite() {
    1.0  // Maximum distance
} else {
    1.0 - similarity.clamp(-1.0, 1.0)
}
}

Best Practices

Memory Optimization

  1. Use sparse vectors for high-sparsity data: Automatic at >50% zeros
  2. Batch insert for HNSW: Build index once after all data loaded
  3. Choose appropriate HNSW config: Don’t over-provision m/ef
  4. Monitor memory with HNSWMemoryStats: Track dense vs sparse counts
#![allow(unused)]
fn main() {
let stats = index.memory_stats();
println!("Dense: {}, Sparse: {}, Total bytes: {}",
    stats.dense_count, stats.sparse_count, stats.embedding_bytes);
}

Search Performance

  1. Pre-compute query magnitude: Done automatically in search
  2. Use HNSW for >10K vectors: Brute-force for smaller sets
  3. Tune ef_search: Higher for recall, lower for speed
  4. Parallel threshold: Automatic at 5000 vectors

Unified Entity Best Practices

  1. Use for cross-engine queries: When embeddings relate to graph/relational data
  2. Entity key conventions: Use prefixes like user:, doc:, item:
  3. Separate embedding namespace: Use store_embedding for isolated vectors

Dependencies

CratePurpose
tensor_storePersistence, SparseVector, HNSWIndex, SIMD
rayonParallel iteration for large datasets
serdeSerialization of types
tracingInstrumentation and observability

Note: wide (SIMD f32x8 operations) is a transitive dependency via tensor_store.