DistanceSensitiveHash - Lightning-Fast Euclidean Distance Search
Find similar items in microseconds. This library compresses high-dimensional vectors (embeddings, features, user preferences) into tiny fingerprints that let you estimate Euclidean similarity 9× faster while using 94% less memory than naive Euclidean distance comparison. Perfect for rapidly narrowing a set of options, and edge deployments.
What's the Problem?
You have 100,000 high-dimensional vectors and need to find similar ones fast. Works for anything vector-shaped:
- Product feature vectors
- Image / text embeddings
- User preference vectors
- Audio fingerprints
- Time series data
How It Works (30-Second Version)
- Feed your vector into
encodeE2LSH()
→ get a compact integer signature (16-512 integers) - Compare signatures with
estimateSimilarity()
→ Euclidean similarity in microseconds - Profit
Trade-off: 5-15% error for orders of magnitude speed-up and massive memory savings.
Bundle Size & Performance
Tiny footprint: Core library is only ~8KB (gzipped: ~3KB).
Works everywhere:
- ✅ Node.js (v14+)
- ✅ Browser (ES modules)
- ✅ Web Workers (thread-safe)
Quick Start
npm install distance-sensitive-hash
import { encodeE2LSH, estimateSimilarity } from 'distance-sensitive-hash';
const productA = [0.8, 0.2, 0.9, /*...512 dims...*/ 0.1, 0.7];
const productB = [0.7, 0.3, 0.8, /*...512 dims...*/ 0.2, 0.6];
// 1. Hash once (64 integers)
const fpA = encodeE2LSH(productA, 64);
const fpB = encodeE2LSH(productB, 64);
// 2. Compare in <1 μs
const sim = estimateSimilarity(fpA, fpB);
console.log('Similarity ≈', sim); // 0.0 – 1.0
Two Algorithms, One API
Name | Function | When to Pick |
---|---|---|
CS-E2LSH (default) | encodeCSE2LSH() |
3× faster encoding (recommended) |
HCS-E2LSH | encodeHCSE2LSH() |
Better accuracy for medium distances |
Pick based on your use case - CS for speed, HCS for better distance estimation.
API Cheatsheet
encodeE2LSH(vector, bits, seed = 42, variant = 'cs', w = 4.0, order = 2)
// vector: number[] – your data
// bits: int – signature size (16-512 recommended)
// seed: int – reproducibility
// variant: string – 'cs' or 'hcs'
// w: float – width parameter (4.0 default)
// order: int – tensor order for HCS (2-4)
// returns: Int32Array – compact fingerprint
estimateSimilarity(fpA, fpB, w = 4)
// fpA, fpB: Int32Array – fingerprints
// w: float – width parameter used in encoding
// returns: float 0-1 – Euclidean similarity estimate
Benchmarks
Task | Ops/sec | vs Vanilla | Accuracy |
---|---|---|---|
CS-E2LSH encode (512d→64-int) | 66,701 sigs/sec | — | 85-95% |
CS-E2LSH compare | 8.7M ops/sec | 9.5× faster | ±0.15 @ ≥0.7 |
HCS-E2LSH encode (512d→64-int) | 7,857 sigs/sec | — | 85-95% |
HCS-E2LSH compare | 8.8M ops/sec | 9.6× faster | ±0.13 @ ≥0.7 |
Vanilla Euclidean | 915k ops/sec | baseline | exact |
Performance by Signature Size
Size | Generation | Comparison | Memory | Accuracy |
---|---|---|---|---|
16 | 50.8k/s | 9.3M/s | 64B | ±0.10 |
32 | 81.5k/s | 14.6M/s | 128B | ±0.13 |
64 | 66.7k/s | 8.7M/s | 256B | ±0.15 |
128 | 46.4k/s | 4.0M/s | 512B | ±0.13 |
256 | 15.5k/s | 2.0M/s | 1KB | ±0.09 |
Distance Accuracy Examples
True Distance | CS-E2LSH Error | HCS-E2LSH Error |
---|---|---|
0.377 (close) | ±0.064 (17%) | ±0.171 (45%) |
1.405 (medium) | ±0.543 (39%) | ±0.318 (23%) |
3.443 (far) | ±0.466 (14%) | ±0.231 (7%) |
Memory Savings
Massive reduction compared to full vector storage:
Vector Size | Full Vector | 64-int Signature | Reduction |
---|---|---|---|
512-d float | 2,048 bytes | 256 bytes | 8× smaller |
1024-d float | 4,096 bytes | 256 bytes | 16× smaller |
2048-d float | 8,192 bytes | 256 bytes | 32× smaller |
Real-world impact:
- 1M product vectors (512-d): 2GB → 256MB (87.5% reduction)
- 10M user vectors (1024-d): 40GB → 2.5GB (93.75% reduction)
- 100M image vectors (2048-d): 800GB → 25GB (96.9% reduction)
When to Use / Not Use
✅ Use for:
- Large datasets (≥10k items)
- Real-time recommendations
- Memory-constrained environments
- Approximate nearest neighbor search
- Pre-filtering before exact comparisons
❌ Don't use for:
- Exact similarity requirements
- Very small datasets (<100 items)
- Cosine similarity (use SimilaritySensitiveHash)
- Jaccard similarity (use Grouped-OPH)
Academic Roots
Implements Count-Sketch E2LSH (CS-E2LSH) and Higher-Order Count-Sketch E2LSH (HCS-E2LSH) from:
Verma & Pratap, Improving E2LSH with Count-Sketch and Higher-Order Count-Sketch, arXiv:2503.06737
Built on the classic LSH framework (Gionis, Indyk, Motwani, VLDB 1999) and Euclidean LSH (Datar et al., SoCG 2004).
License
MIT