Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

tensor_cache Benchmarks

The tensor_cache crate provides LLM response caching with exact, semantic (HNSW), and embedding caches.

Exact Cache (Hash-based O(1))

OperationTime
lookup_hit208 ns
lookup_miss102 ns

Semantic Cache (HNSW-based O(log n))

OperationTime
lookup_hit21 us

Put (Exact + Semantic + HNSW insert)

EntriesTime
10049 us
1,00047 us
10,00053 us

Embedding Cache

OperationTime
lookup_hit230 ns
lookup_miss110 ns

Eviction (batch processing)

Entries in CacheTime
1,0003.3 us
5,0004.0 us
10,0008.4 us

Distance Metrics (raw computation, 128d)

MetricTimeNotes
Jaccard73 nsFastest, best for sparse
Euclidean105 nsGood for spatial data
Cosine186 nsDefault, best for dense
Angular193 nsAlternative to cosine

Semantic Lookup by Metric (1000 entries)

MetricTime
Jaccard28.6 us
Euclidean27.8 us
Cosine28.4 us

Sparse vs Dense (80% sparsity)

ConfigurationTimeImprovement
Dense lookup28.8 usbaseline
Sparse lookup24.1 us16% faster

Auto-Metric Selection

OperationTime
Sparsity check0.66 ns
Auto-select dense13.4 us
Auto-select sparse16.5 us

Redis Comparison

SystemIn-ProcessOver TCP
Redis~60 ns~143 us
tensor_cache (exact)208 ns~143 us*
tensor_cache (semantic)21 usN/A

*Estimated: network latency dominates (99.9% of time).

Key Insight: For embedded use (no network), Redis is 3.5x faster for exact lookups. Over TCP (typical deployment), both are network-bound at ~143us. Our differentiator is semantic search (21us) which Redis cannot provide.

Analysis

  • Exact cache: Hash-based O(1) lookup provides sub-microsecond hit/miss detection
  • Semantic cache: HNSW index provides O(log n) similarity search (~21us for hit)
  • Embedding cache: Fast O(1) lookup for precomputed embeddings
  • Put performance: Consistent ~50us regardless of cache size (HNSW insert is O(log n))
  • Eviction: Efficient batch eviction with LRU/LFU/Cost/Hybrid strategies
  • Distance metrics: Auto-selection based on sparsity (>=70% sparse uses Jaccard)
  • Token counting: tiktoken cl100k_base encoding for accurate GPT-4 token counts
  • Cost tracking: Estimates cost savings based on model pricing tables

Cache Layers

LayerComplexityUse Case
ExactO(1)Identical prompts
SemanticO(log n)Similar prompts
EmbeddingO(1)Precomputed embeddings

Eviction Strategies

StrategyDescription
LRUEvict least recently accessed
LFUEvict least frequently accessed
CostBasedEvict lowest cost efficiency
HybridWeighted combination (recommended)

Metric Selection Guide

Embedding TypeRecommended Metric
OpenAI/Cohere (dense)Cosine (default)
Sparse (>=70% zeros)Jaccard (auto-selected)
Spatial/geographicEuclidean
Custom binaryJaccard