Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Tensor Compress

Module 8 of Neumann. Provides tensor-native compression exploiting the mathematical structure of high-dimensional embeddings.

The primary compression method is Tensor Train (TT) decomposition, which decomposes vectors reshaped as tensors into a chain of smaller 3D cores using successive SVD truncations. This achieves 10-40x compression for 1024+ dimension vectors while enabling similarity computations directly in compressed space.

Design Principles

  1. Tensor Mathematics: Uses Tensor Train decomposition to exploit low-rank structure
  2. Higher Dimensions Are Lower: Decomposes vectors into products of smaller tensors
  3. Streaming I/O: Process large snapshots without loading entire dataset
  4. Incremental Updates: Delta snapshots for efficient replication
  5. Pure Rust: No external LAPACK/BLAS dependencies - fully portable

Key Types

TypeDescription
TTVectorComplete TT-decomposition of a vector with cores, shape, and ranks
TTCoreSingle 3D tensor core (left_rank x mode_size x right_rank)
TTConfigConfiguration for TT decomposition (shape, max_rank, tolerance)
CompressionConfigSnapshot compression settings (tensor mode, delta, RLE)
TensorModeCompression mode enum (currently TensorTrain variant)
RleEncoded<T>Run-length encoded data with values and run lengths
DeltaSnapshotSnapshot containing only changes since a base snapshot
DeltaChainChain of deltas with efficient lookup and compaction
StreamingWriterMemory-bounded incremental snapshot writer
StreamingReaderIterator-based snapshot reader
StreamingTTWriterStreaming TT-compressed vector writer
StreamingTTReaderStreaming TT-compressed vector reader
MatrixRow-major matrix for SVD operations
SvdResultTruncated SVD result (U, S, Vt matrices)
TensorViewZero-copy logical view of tensor data
DeltaBuilderBuilder for creating delta snapshots

Error Types

ErrorDescription
TTError::ShapeMismatchVector dimension doesn’t match reshape target
TTError::EmptyVectorCannot decompose empty vector
TTError::InvalidRankTT-rank must be >= 1
TTError::IncompatibleShapesTT vectors have different shapes for operation
TTError::InvalidShapeShape contains zero or is empty
TTError::InvalidToleranceTolerance must be 0 < tol <= 1
TTError::DecomposeSVD decomposition failed
FormatError::InvalidMagicFile magic bytes don’t match expected
FormatError::UnsupportedVersionFormat version is newer than supported
FormatError::SerializationBincode serialization/deserialization error
DeltaError::BaseNotFoundReferenced base snapshot doesn’t exist
DeltaError::SequenceGapDelta sequence numbers have gaps
DeltaError::ChainTooLongDelta chain exceeds maximum length
DecomposeError::EmptyMatrixCannot decompose empty matrix
DecomposeError::DimensionMismatchMatrix dimensions don’t match for operation
DecomposeError::SvdNotConvergedSVD iteration didn’t converge

Architecture

graph TD
    subgraph tensor_compress
        TT[tensor_train.rs<br/>TT-SVD decomposition]
        DC[decompose.rs<br/>SVD implementation]
        FMT[format.rs<br/>Snapshot format]
        STR[streaming.rs<br/>Streaming I/O]
        STT[streaming_tt.rs<br/>Streaming TT]
        INC[incremental.rs<br/>Delta snapshots]
        DLT[delta.rs<br/>Delta + varint encoding]
        RLE[rle.rs<br/>Run-length encoding]
    end

    TT --> DC
    FMT --> TT
    FMT --> DLT
    FMT --> RLE
    STR --> FMT
    STT --> TT
    INC --> FMT

Tensor Train Decomposition

Algorithm Overview

The TT-SVD algorithm (Oseledets 2011) decomposes a vector by:

  1. Reshape: Convert 1D vector to multi-dimensional tensor
  2. Left-to-right sweep: For each mode k from 1 to n-1:
    • Left-unfold the current tensor into a matrix
    • Compute truncated SVD: A = U S Vt
    • Store U as the k-th core
    • Multiply S * Vt to get the remainder for next iteration
  3. Final core: The last remainder becomes the final core
graph LR
    subgraph "TT-SVD Algorithm"
        V[Vector 4096-dim] --> R[Reshape to 8x8x8x8]
        R --> U1[Unfold mode 1<br/>64 x 64]
        U1 --> SVD1[SVD truncate<br/>rank=8]
        SVD1 --> C1[Core 1<br/>1x8x8]
        SVD1 --> R2[Remainder<br/>8x512]
        R2 --> SVD2[SVD truncate]
        SVD2 --> C2[Core 2<br/>8x8x8]
        SVD2 --> R3[Remainder]
        R3 --> SVD3[SVD truncate]
        SVD3 --> C3[Core 3<br/>8x8x8]
        SVD3 --> C4[Core 4<br/>8x8x1]
    end

Compression Example

For a 4096-dim embedding reshaped to (8, 8, 8, 8):

Original: 4096 floats = 16 KB
TT-cores: 1x8x8 + 8x8x8 + 8x8x8 + 8x8x1 = 64 + 512 + 512 + 64 = 1152 floats
With max_rank=8: 1x8x4 + 4x8x4 + 4x8x4 + 4x8x1 = 32 + 128 + 128 + 32 = 320 floats = 1.25 KB
Compression: 12.8x

SVD Implementation Details

The module implements two SVD algorithms:

1. Power Iteration with Deflation (small matrices)

Used when matrix dimensions are <= 32 or rank is close to matrix size:

#![allow(unused)]
fn main() {
// Simplified power iteration
fn power_iteration(a: &Matrix, max_iter: usize, tol: f32) -> (sigma, u, v) {
    // Initialize v randomly (deterministic seed)
    let mut v: Vec<f32> = (0..cols).map(|i| ((i * 7 + 3) % 13) as f32 / 13.0 - 0.5).collect();
    normalize(&mut v);

    for _ in 0..max_iter {
        // u = A * v, then normalize
        u = matmul(a, v);
        new_sigma = normalize(&mut u);

        // v = A^T * u, then normalize
        v = matmul_transpose(a, u);
        normalize(&mut v);

        // Check convergence
        if (new_sigma - sigma).abs() < tol * sigma.max(1.0) {
            return (new_sigma, u, v);
        }
        sigma = new_sigma;
    }
}
}

After finding each singular triplet, the algorithm deflates: A = A - sigma u vT

2. Randomized SVD (large matrices)

Uses the Halko-Martinsson-Tropp 2011 algorithm for matrices > 32 dimensions:

graph TD
    subgraph "Randomized SVD Pipeline"
        A[Input Matrix A<br/>m x n] --> OMEGA[Generate Gaussian<br/>Omega n x k+p]
        A --> SAMPLE[Y = A * Omega<br/>m x k+p]
        SAMPLE --> QR[QR decompose Y<br/>Q = orth basis]
        QR --> PROJECT[B = Q^T * A<br/>k+p x n]
        PROJECT --> SMALL_SVD[SVD of small B<br/>power iteration]
        SMALL_SVD --> RECONSTRUCT[U = Q * U_small]
    end

Key implementation details:

  • Gaussian matrix generation: Uses a Linear Congruential Generator (LCG) with Box-Muller transform for deterministic, portable random numbers
  • QR orthonormalization: Modified Gram-Schmidt for numerical stability
  • Oversampling: Adds 5 extra columns to improve accuracy
  • Convergence: 20 iterations max (sufficient for embedding vectors)
#![allow(unused)]
fn main() {
// LCG parameters from Numerical Recipes
fn lcg_next(state: &mut u64) -> u64 {
    *state = state.wrapping_mul(6_364_136_223_846_793_005)
                  .wrapping_add(1_442_695_040_888_963_407);
    *state
}

// Box-Muller transform for Gaussian
fn box_muller(u1: f32, u2: f32) -> (f32, f32) {
    let r = (-2.0 * u1.ln()).sqrt();
    let theta = 2.0 * PI * u2;
    (r * theta.cos(), r * theta.sin())
}
}

Optimal Shape Selection

The module includes hardcoded optimal shapes for common embedding dimensions:

DimensionShapeWhy
64[4, 4, 4]3 balanced factors
128[4, 4, 8]Near-balanced
256[4, 8, 8]Near-balanced
384[4, 8, 12]all-MiniLM-L6-v2
512[8, 8, 8]Perfect cube
768[8, 8, 12]BERT dimension
1024[8, 8, 16]Common LLM size
1536[8, 12, 16]OpenAI ada-002
2048[8, 16, 16]Near-balanced
3072[8, 16, 24]Large models
4096[8, 8, 8, 8]4D balanced
8192[8, 8, 8, 16]Extra large

For non-standard dimensions, factorize_balanced finds factors close to the nth root:

#![allow(unused)]
fn main() {
fn factorize_balanced(n: usize) -> Vec<usize> {
    // Target 2-6 factors based on log2(n)
    let target_factors = (ln(n) / ln(2)).ceil().clamp(2, 6);
    let target_size = n^(1/target_factors);

    // Greedily find factors close to target_size
    // ...
}
}

TT Operations

FunctionDescriptionComplexity
tt_decomposeDecompose vector to TT formatO(n d r^2)
tt_decompose_batchParallel batch decomposition (4+ vectors)O(batch n d * r^2 / threads)
tt_reconstructReconstruct vector from TTO(d^n * r^2)
tt_dot_productDot product in TT spaceO(n d r^4)
tt_dot_product_batchBatch dot productsParallel when >= 4 targets
tt_cosine_similarityCosine similarity in TT spaceO(n d r^4)
tt_cosine_similarity_batchBatch cosine similaritiesParallel when >= 4 targets
tt_euclidean_distanceEuclidean distance in TT spaceO(n d r^4)
tt_euclidean_distance_batchBatch Euclidean distancesParallel when >= 4 targets
tt_normL2 norm of TT vectorO(n d r^4)
tt_scaleScale TT vector by constantO(cores[0].size)

Where: n = number of modes, d = mode size, r = TT-rank

TT Gram Matrix Computation

Computing dot products and norms in TT space uses the Gram matrix approach:

#![allow(unused)]
fn main() {
// Gram matrix propagation for dot product
fn tt_dot_product(a: &TTVector, b: &TTVector) -> f32 {
    let mut gram = vec![1.0f32];  // Start with 1x1 identity

    for (core_a, core_b) in a.cores.iter().zip(b.cores.iter()) {
        let (r1a, n, r2a) = core_a.shape;
        let (r1b, _, r2b) = core_b.shape;
        let mut new_gram = vec![0.0; r2a * r2b];

        // Contract: new_gram[a,b] = sum_{k,i,j} gram[i,j] * A[i,k,a] * B[j,k,b]
        for a_idx in 0..r2a {
            for b_idx in 0..r2b {
                for k in 0..n {
                    for ia in 0..r1a {
                        for ib in 0..r1b {
                            let g = gram[ia * r1b + ib];
                            new_gram[a_idx * r2b + b_idx] +=
                                g * core_a.get(ia, k, a_idx) * core_b.get(ib, k, b_idx);
                        }
                    }
                }
            }
        }
        gram = new_gram;
    }

    gram[0]  // Final 1x1 Gram matrix
}
}

Usage

#![allow(unused)]
fn main() {
use tensor_compress::{tt_decompose, tt_reconstruct, tt_cosine_similarity, TTConfig};

let embedding: Vec<f32> = get_embedding();  // 4096-dim
let config = TTConfig::for_dim(4096)?;

// Decompose
let tt = tt_decompose(&embedding, &config)?;
println!("Compression: {:.1}x", tt.compression_ratio());
println!("Storage: {} floats", tt.storage_size());
println!("Max rank: {}", tt.max_rank());

// Reconstruct
let restored = tt_reconstruct(&tt);

// Compute similarity without reconstruction
let tt2 = tt_decompose(&other_embedding, &config)?;
let sim = tt_cosine_similarity(&tt, &tt2)?;
}

Batch Operations

Batch operations use rayon for parallel processing when handling 4+ vectors:

#![allow(unused)]
fn main() {
use tensor_compress::{tt_decompose_batch, tt_cosine_similarity_batch, TTConfig};

let vectors: Vec<Vec<f32>> = load_embeddings();
let config = TTConfig::for_dim(4096)?;

// Batch decompose (parallel for 4+ vectors)
let refs: Vec<&[f32]> = vectors.iter().map(|v| v.as_slice()).collect();
let tts = tt_decompose_batch(&refs, &config)?;

// Batch similarity search
let query_tt = &tts[0];
let similarities = tt_cosine_similarity_batch(query_tt, &tts[1..])?;

// Find top-k
let mut indexed: Vec<_> = similarities.iter().enumerate().collect();
indexed.sort_by(|a, b| b.1.partial_cmp(a.1).unwrap());
let top_5: Vec<_> = indexed.iter().take(5).collect();
}

The parallel threshold constant is:

#![allow(unused)]
fn main() {
const PARALLEL_THRESHOLD: usize = 4;
}

Configuration

TTConfig Presets

Presetmax_ranktoleranceUse Case
for_dim(d)81e-4Balanced compression/accuracy
high_compression(d)41e-2Maximize compression (2-3x more)
high_accuracy(d)161e-6Maximize accuracy (<0.1% error)

TTConfig Validation

#![allow(unused)]
fn main() {
impl TTConfig {
    pub fn validate(&self) -> Result<(), TTError> {
        if self.shape.is_empty() {
            return Err(TTError::InvalidShape("empty shape".into()));
        }
        if self.shape.contains(&0) {
            return Err(TTError::InvalidShape("shape contains zero".into()));
        }
        if self.max_rank < 1 {
            return Err(TTError::InvalidRank);
        }
        if self.tolerance <= 0.0 || self.tolerance > 1.0 || !self.tolerance.is_finite() {
            return Err(TTError::InvalidTolerance(self.tolerance));
        }
        Ok(())
    }
}
}

CompressionConfig

#![allow(unused)]
fn main() {
pub struct CompressionConfig {
    pub tensor_mode: Option<TensorMode>,  // TT compression for vectors
    pub delta_encoding: bool,             // For sorted ID lists
    pub rle_encoding: bool,               // For repeated values
}

// Presets
CompressionConfig::high_compression()  // max_rank=4, all encodings enabled
CompressionConfig::balanced(dim)       // max_rank=8, all encodings enabled
CompressionConfig::high_accuracy(dim)  // max_rank=16, all encodings enabled
}

Dimension Presets

ConstantValueModel
SMALL64MiniLM and small models
MEDIUM384all-MiniLM-L6-v2
STANDARD768BERT, sentence-transformers
LARGE1536OpenAI text-embedding-ada-002
XLARGE4096LLaMA and large models

Streaming Operations

State Machine

stateDiagram-v2
    [*] --> Created: new()
    Created --> Writing: write_entry() / write_vector()
    Writing --> Writing: write_entry() / write_vector()
    Writing --> Finishing: finish()
    Finishing --> [*]: success

    note right of Created
        Magic bytes written
        entry_count = 0
    end note

    note right of Writing
        Length-prefixed entries
        entry_count incremented
    end note

    note right of Finishing
        Trailer written with:
        - entry_count
        - config
        - data_start offset
    end note

File Format

Uses a trailer-based header so entry count is known at the end:

+------------------------+
| Magic (NEUS/NEUT)  4B  |  Identifies streaming snapshot/TT
+------------------------+
| Entry 1 length     4B  |  Little-endian u32
+------------------------+
| Entry 1 data      var  |  Bincode-serialized entry
+------------------------+
| Entry 2 length     4B  |
+------------------------+
| Entry 2 data      var  |
+------------------------+
| ...                    |
+------------------------+
| Trailer           var  |  Bincode-serialized header
+------------------------+
| Trailer size       8B  |  Little-endian u64
+------------------------+

Security limits:

  • Maximum trailer size: 1 MB (MAX_TRAILER_SIZE)
  • Maximum entry size: 100 MB (MAX_ENTRY_SIZE)

Usage

#![allow(unused)]
fn main() {
use tensor_compress::streaming::{StreamingWriter, StreamingReader};

// Write entries one at a time
let mut writer = StreamingWriter::new(file, config)?;
for entry in entries {
    writer.write_entry(&entry)?;
}
writer.finish()?;

// Read entries one at a time (iterator-based)
let reader = StreamingReader::open(file)?;
println!("Entry count: {}", reader.entry_count());
for entry in reader {
    process(entry?);
}
}

Streaming TT Operations

FunctionDescription
StreamingTTWriter::newCreate TT streaming writer
StreamingTTWriter::write_vectorDecompose and write vector
StreamingTTWriter::write_ttWrite pre-decomposed TT
StreamingTTWriter::finishFinalize with trailer
StreamingTTReader::openOpen TT streaming file
streaming_tt_similarity_searchSearch streaming TT file
convert_vectors_to_streaming_ttBatch convert vectors
read_streaming_tt_allLoad all TT vectors
#![allow(unused)]
fn main() {
use tensor_compress::streaming_tt::{StreamingTTWriter, StreamingTTReader,
    streaming_tt_similarity_search};

// Create streaming TT file
let config = TTConfig::for_dim(768)?;
let mut writer = StreamingTTWriter::new(file, config.clone())?;

for vector in vectors {
    writer.write_vector(&vector)?;  // Decompose on-the-fly
}
writer.finish()?;

// Similarity search without loading all into memory
let query_tt = tt_decompose(&query, &config)?;
let top_10 = streaming_tt_similarity_search(file, &query_tt, 10)?;
// Returns Vec<(index, similarity)> sorted by descending similarity
}

Merge and Convert Operations

#![allow(unused)]
fn main() {
use tensor_compress::streaming::{convert_to_streaming, read_streaming_to_snapshot,
    merge_streaming};

// Convert non-streaming snapshot to streaming format
let count = convert_to_streaming(&snapshot, output_file)?;

// Read streaming format into full snapshot (for compatibility)
let snapshot = read_streaming_to_snapshot(file)?;

// Merge multiple streaming snapshots
let count = merge_streaming(vec![file1, file2, file3], output, config)?;
}

Incremental Updates

Delta Snapshot Architecture

graph TD
    subgraph "Delta Chain"
        BASE[Base Snapshot<br/>Seq 0] --> D1[Delta 1<br/>Seq 1-10]
        D1 --> D2[Delta 2<br/>Seq 11-25]
        D2 --> D3[Delta 3<br/>Seq 26-30]
    end

    subgraph "Compaction"
        BASE2[Base] --> COMPACT[Compacted<br/>Snapshot]
        D1_2[Delta 1] --> COMPACT
        D2_2[Delta 2] --> COMPACT
        D3_2[Delta 3] --> COMPACT
    end

Delta Entry Types

#![allow(unused)]
fn main() {
pub enum ChangeType {
    Put,    // Entry was added or updated
    Delete, // Entry was deleted
}

pub struct DeltaEntry {
    pub key: String,
    pub change: ChangeType,
    pub value: Option<CompressedEntry>,  // None for Delete
    pub sequence: u64,
}
}

Usage

#![allow(unused)]
fn main() {
use tensor_compress::incremental::{DeltaBuilder, DeltaChain, apply_delta,
    merge_deltas, diff_snapshots};

// Create delta
let mut builder = DeltaBuilder::new("base_snapshot_id", sequence);
builder.put("key1", entry1);
builder.delete("key2");
let delta = builder.build();

// Apply delta
let new_snapshot = apply_delta(&base, &delta)?;

// Chain management
let mut chain = DeltaChain::new(base_snapshot);
chain.push(delta1)?;
chain.push(delta2)?;
let value = chain.get("key1");  // Checks chain then base

// Compact when chain grows long
if chain.should_compact(10) {
    let compacted = chain.compact()?;
}

// Compare two snapshots
let delta = diff_snapshots(&old_snapshot, &new_snapshot, "old_id")?;

// Merge multiple deltas into one
let merged = merge_deltas(&[delta1, delta2, delta3])?;
}

Delta Operations

FunctionDescription
DeltaBuilder::newCreate delta builder with base ID and start sequence
DeltaBuilder::putRecord a put (add/update) change
DeltaBuilder::deleteRecord a delete change
DeltaBuilder::buildBuild the delta snapshot
apply_deltaApply delta to base snapshot
merge_deltasMerge multiple deltas (keeps latest state per key)
diff_snapshotsCompute delta between two snapshots
DeltaChain::getGet current state of key (checks chain then base)
DeltaChain::compactCompact all deltas into new base
DeltaChain::should_compactCheck if compaction is recommended

Delta Format

+------------------------+
| Magic (NEUD)       4B  |
+------------------------+
| Version            2B  |
+------------------------+
| Base ID           var  |  String (length-prefixed)
+------------------------+
| Sequence Range     16B |  (start, end) u64 pair
+------------------------+
| Change Count        8B |
+------------------------+
| Created At          8B |  Unix timestamp
+------------------------+
| Entries           var  |  Bincode-serialized Vec<DeltaEntry>
+------------------------+

Lossless Compression

Delta + Varint Encoding

For sorted integer sequences (node IDs, timestamps):

graph LR
    subgraph "Delta + Varint Pipeline"
        IDS[IDs: 100, 101, 102, 105, 110] --> DELTA[Delta encode:<br/>100, 1, 1, 3, 5]
        DELTA --> VARINT[Varint encode]
        VARINT --> OUT[Bytes: ~7 bytes<br/>vs 40 bytes raw]
    end

Algorithm:

#![allow(unused)]
fn main() {
// Delta encoding: store first value, then differences
pub fn delta_encode(ids: &[u64]) -> Vec<u64> {
    let mut result = vec![ids[0]];
    for window in ids.windows(2) {
        result.push(window[1].saturating_sub(window[0]));
    }
    result
}

// Varint encoding: 7 bits per byte, high bit = continuation
pub fn varint_encode(values: &[u64]) -> Vec<u8> {
    let mut result = Vec::with_capacity(values.len() * 2);
    for &value in values {
        let mut v = value;
        loop {
            let byte = (v & 0x7f) as u8;
            v >>= 7;
            if v == 0 {
                result.push(byte);  // Final byte (no continuation)
                break;
            }
            result.push(byte | 0x80);  // Continuation bit set
        }
    }
    result
}
}

Usage:

#![allow(unused)]
fn main() {
use tensor_compress::{compress_ids, decompress_ids};

let ids: Vec<u64> = (1000..2000).collect();
let compressed = compress_ids(&ids);  // ~100 bytes vs 8000

let restored = decompress_ids(&compressed);
assert_eq!(ids, restored);
}

Varint byte sizes:

Value RangeBytes
0 - 1271
128 - 16,3832
16,384 - 2,097,1513
2,097,152 - 268,435,4554
… up to u64::MAX10

Run-Length Encoding

For repeated values:

#![allow(unused)]
fn main() {
use tensor_compress::{rle_encode, rle_decode};

let statuses = vec!["active"; 1000];
let encoded = rle_encode(&statuses);
assert_eq!(encoded.runs(), 1);  // Single run

// Storage: 1 string + 1 u32 = ~12 bytes vs 6000+ bytes
}

Internal representation:

#![allow(unused)]
fn main() {
pub struct RleEncoded<T: Eq> {
    pub values: Vec<T>,      // Unique values in order
    pub run_lengths: Vec<u32>, // Count for each value
}
}

Compression scenarios:

Data PatternRunsCompression
[5, 5, 5, 5, 5] (1000x)1500x
[1, 2, 3, 4, 5] (all different)50.8x (overhead)
[1, 1, 2, 2, 2, 3, 1, 1, 1, 1]42.5x
Status column (pending/active/done)~300 per 10000~33x

Sparse Vector Format

For vectors with >50% zeros:

#![allow(unused)]
fn main() {
use tensor_compress::{compress_sparse, compress_dense_as_sparse,
    should_use_sparse, should_use_sparse_threshold};

// Direct sparse compression
let positions = vec![0, 50, 99];
let values = vec![1.0, 2.0, 3.0];
let compressed = compress_sparse(100, &positions, &values);

// Auto-detect and compress
if should_use_sparse_threshold(&vector, 0.5) {
    let compressed = compress_dense_as_sparse(&vector);
}

// Check if sparse is beneficial
if should_use_sparse(dimension, non_zero_count) {
    // Use sparse format
}
}

Storage calculation:

#![allow(unused)]
fn main() {
// sparse_storage_size = 8 + 8 + nnz*2 + nnz*4 = 16 + nnz*6
// Dense storage = dimension * 4

// Sparse is better when: 16 + nnz*6 < dimension*4
// Solving: nnz < (dimension*4 - 16) / 6 = dimension*0.67 - 2.67
}
DimensionMax NNZ for SparseSparsity Threshold
1006464%
100066466.4%
4096272866.6%

Compressed Value Types

#![allow(unused)]
fn main() {
pub enum CompressedValue {
    Scalar(CompressedScalar),           // Int, Float, String, Bool, Null
    VectorRaw(Vec<f32>),                // Uncompressed
    VectorTT { cores, original_dim, shape, ranks },  // TT-compressed
    VectorSparse { dimension, positions, values },   // Sparse
    IdList(Vec<u8>),                    // Delta + varint encoded
    RleInt(RleEncoded<i64>),            // RLE encoded integers
    Pointer(String),                    // Single pointer
    Pointers(Vec<String>),              // Multiple pointers
}
}

Automatic Format Selection

#![allow(unused)]
fn main() {
pub fn compress_vector(vector: &[f32], key: &str, field_name: &str,
    config: &CompressionConfig) -> Result<CompressedValue, FormatError> {

    // 1. Check for embedding-like keys
    let is_embedding = key.starts_with("emb:") ||
                       field_name == "_embedding" ||
                       field_name == "vector";

    if is_embedding {
        if let Some(TensorMode::TensorTrain(tt_config)) = &config.tensor_mode {
            return Ok(CompressedValue::VectorTT { ... });
        }
    }

    // 2. Check for ID list pattern
    if config.delta_encoding && looks_like_id_list(vector, field_name) {
        return Ok(CompressedValue::IdList(...));
    }

    // 3. Fall back to raw
    Ok(CompressedValue::VectorRaw(vector.to_vec()))
}
}

Performance

Benchmarks on Apple M4 (aarch64, MacBook Air 24GB), release build:

DimensionDecomposeReconstructSimilarityCompression
646.2 us29.5 us1.1 us2.0x
25613.4 us113.0 us1.5 us4.6x
76826.9 us431.7 us2.4 us10.7x
153662.0 us709.8 us2.0 us16.0x
4096464.5 us2142.2 us2.4 us42.7x

Batch operations (768-dim, 1000 vectors):

OperationTimePer-vector
tt_decompose_batch21 ms21.0 us
tt_cosine_similarity_batch11.3 ms11.4 us

Throughput: 39,318 vectors/sec (768-dim decomposition)

Industry Comparison

MethodCompressionRecallNotes
Tensor Train (this)10-42x~99%Similarity in compressed space
Scalar Quantization4x99%+Industry default
Product Quantization16-64x56-90%Requires training
Binary Quantization32x80-95%Speed-optimized

Edge Cases and Gotchas

Vector Content Patterns

PatternCompressionReconstructionNotes
Constant (all same)Excellent (>5x)AccurateRank-1 structure
All zerosGoodAccurateDegenerate case
Single spikePoorModerateNo low-rank structure
Linear rampGood (>2x)GoodLow-rank
Alternating +1/-1PoorModerateHigh-frequency needs high rank
Random denseGoodGood (>0.9 cosine)Typical embeddings
90% zerosConsider sparse insteadn/aUse compress_dense_as_sparse

Numerical Edge Cases

#![allow(unused)]
fn main() {
// Very small values (denormalized floats)
let tiny: Vec<f32> = (0..64).map(|i| (i as f32) * 1e-38).collect();
// Works, but may lose precision

// Large values (1e6 range)
let large: Vec<f32> = (0..64).map(|i| (i as f32) * 1e6).collect();
// Works, no overflow

// Prime dimensions
let prime_127: Vec<f32> = (0..127).map(|i| (i as f32 * 0.1).sin()).collect();
// Works but may have poor compression
}

Streaming Gotchas

  1. Incomplete files: Magic bytes are written first, but entry count is in trailer. If writer crashes before finish(), the file is corrupt.

  2. Memory limits: MAX_ENTRY_SIZE = 100MB and MAX_TRAILER_SIZE = 1MB prevent allocation attacks. Exceeding these returns an error.

  3. Seek requirement: StreamingReader::open requires Seek to read the trailer. For non-seekable streams, use read_streaming_to_snapshot which buffers.

Delta Chain Gotchas

  1. Chain length: Default max_chain_len = 100. After this, push() returns ChainTooLong error. Call compact() periodically.

  2. Sequence gaps: Deltas should have contiguous sequences. The merge_deltas function only keeps the latest state per key.

  3. Base reference: Deltas store a base_id string but don’t validate it exists. Your application must track base snapshots.

Performance Tips and Best Practices

Choosing Configuration

#![allow(unused)]
fn main() {
// For search/retrieval (similarity queries)
let config = TTConfig::for_dim(dim)?;  // Balanced

// For archival/cold storage
let config = TTConfig::high_compression(dim)?;  // Smaller, slower queries

// For real-time applications
let config = TTConfig::high_accuracy(dim)?;  // Larger, faster queries
}

Batch Size Optimization

#![allow(unused)]
fn main() {
// Below parallel threshold (4), sequential is faster
// due to thread spawn overhead
let small_batch = tt_decompose_batch(&vectors[..3], &config);  // Sequential

// At threshold, parallel kicks in
let large_batch = tt_decompose_batch(&vectors, &config);  // Parallel if >= 4
}

Memory Efficiency

#![allow(unused)]
fn main() {
// Bad: Load all, then process
let all_vectors = read_streaming_tt_all(file)?;  // Loads all into memory

// Good: Stream process
for tt in StreamingTTReader::open(file)? {
    process(tt?);  // One at a time
}

// Best: Use streaming search
let results = streaming_tt_similarity_search(file, &query_tt, 10)?;
}

Delta Compaction Strategy

#![allow(unused)]
fn main() {
let mut chain = DeltaChain::new(base);

// After N deltas or M total changes
if chain.len() >= 10 || total_changes >= 10000 {
    let new_base = chain.compact()?;
    chain = DeltaChain::new(new_base);
}
}

Dependencies

  • serde: Serialization traits
  • bincode: Binary format
  • thiserror: Error types
  • rayon: Parallel batch operations

No external LAPACK/BLAS - pure Rust SVD implementation.

ModuleRelationship
tensor_storeUses compression for snapshot I/O
tensor_chainDelta compression for state replication
tensor_checkpointSnapshot format integration