Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

tensor_blob Benchmarks

The tensor_blob crate provides S3-style chunked blob storage with content-addressable chunks, garbage collection, and integrity verification.

Overview

tensor_blob focuses on correctness and durability over raw throughput. Performance characteristics depend heavily on:

  • Chunk size configuration
  • Storage backend (memory vs disk)
  • Network conditions for streaming operations

Expected Performance Characteristics

OperationComplexityNotes
Put (upload)O(size / chunk_size)Linear with data size
Get (download)O(size / chunk_size)Linear with data size
DeleteO(chunk_count)Removes metadata + orphan detection
GCO(total_chunks)Full chunk scan
VerifyO(size)Re-hash entire blob
RepairO(corrupted_chunks)Only processes damaged chunks

Chunk Deduplication

Identical content shares chunks via SHA-256 content addressing:

  • Duplicate blobs: Store once, reference count tracked
  • Partial overlap: Shared chunks deduplicated at chunk boundaries
  • Storage savings: Depends on data redundancy

Garbage Collection

OperationBehavior
gc()Returns GcStats { deleted, freed_bytes }
Orphan detectionMarks unreferenced chunks
Active upload protectionGC skips in-progress uploads

Streaming Operations

APIUse Case
BlobWriterStreaming upload, bounded memory
BlobReader::next_chunk()Streaming download, chunk-by-chunk
get_full()Small blobs (<10MB), loads to memory

Configuration Impact

SettingImpact
Larger chunk_sizeFewer chunks, less overhead, less dedup
Smaller chunk_sizeMore chunks, more overhead, better dedup
Recommended1-4 MB chunks for most workloads

Integration Notes

  • Blob store persists to TensorStore
  • Metadata includes checksum, size, creation time
  • Links enable blob-to-graph entity relationships
  • Tags support blob categorization and search

Benchmarking Blob Operations

# Run blob-specific benchmarks (if available)
cargo bench --package tensor_blob

# For custom benchmarking, use the streaming API:
# - Measure upload throughput with BlobWriter
# - Measure download throughput with BlobReader
# - Test GC performance with various orphan ratios