Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Sparse Vectors

Sparse vectors are a memory-efficient representation for high-dimensional data where most values are zero.

When to Use Sparse Vectors

Use CaseDenseSparse
Low dimensions (<100)PreferredOverhead
High dimensions (>1000)Memory intensivePreferred
Most values non-zeroPreferredOverhead
<10% values non-zeroWastefulPreferred

SparseVector Type

#![allow(unused)]
fn main() {
pub struct SparseVector {
    dimension: usize,
    indices: Vec<usize>,
    values: Vec<f32>,
}
}

Memory Comparison

For a 10,000-dimensional vector with 100 non-zero values:

RepresentationMemory
Dense Vec<f32>40,000 bytes
Sparse~800 bytes
Savings98%

Operations

Creation

#![allow(unused)]
fn main() {
// From dense
let sparse = SparseVector::from_dense(&[0.0, 0.5, 0.0, 0.3, 0.0]);

// Incremental
let mut sparse = SparseVector::new(1000);
sparse.set(42, 0.5);
sparse.set(100, 0.3);
}

Arithmetic

#![allow(unused)]
fn main() {
// Subtraction (for deltas)
let delta = new_state.sub(&old_state);

// Weighted average
let blended = SparseVector::weighted_average(&[
    (&vec_a, 0.7),
    (&vec_b, 0.3),
]);

// Orthogonal projection
let residual = vec.project_orthogonal(&basis);
}

Similarity Metrics

MetricFormulaRange
Cosinea.b / (‖a‖ * ‖b‖)-1 to 1
Euclideansqrt(sum((a-b)^2))0 to inf
Jaccard‖A ∩ B‖ / ‖A ∪ B‖0 to 1
Angularacos(cosine) / pi0 to 1
#![allow(unused)]
fn main() {
let sim = vec_a.cosine_similarity(&vec_b);
let dist = vec_a.euclidean_distance(&vec_b);
let jacc = vec_a.jaccard_index(&vec_b);
}

HNSW Index

Hierarchical Navigable Small World for approximate nearest neighbor search:

#![allow(unused)]
fn main() {
let mut index = HNSWIndex::new(HNSWConfig::default());

// Insert
index.insert("doc_1", sparse_vec_1);
index.insert("doc_2", sparse_vec_2);

// Search
let results = index.search(&query_vec, 10); // top 10
}

Configuration

ParameterDefaultDescription
m16Max connections per layer
ef_construction200Build-time search width
ef_search50Query-time search width

Delta Encoding

For tracking state changes:

#![allow(unused)]
fn main() {
// Compute delta between states
let delta = DeltaVector::from_diff(&old_embedding, &new_embedding);

// Apply delta
let new_state = old_state.add(&delta.to_sparse());

// Check if orthogonal (non-conflicting)
if delta_a.is_orthogonal(&delta_b) {
    // Can merge automatically
}
}

Compression

Sparse vectors compress well:

MethodRatioSpeed
Varint indices2-4xFast
Quantization (int8)4xFast
Binary quantization32xVery fast