Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

distance-sensitive-hash

JDvorak25MIT1.0.0

Lightning-fast Euclidean distance search using locality-sensitive hashing. Compress high-dimensional vectors into tiny fingerprints for 10× faster similarity search with 99% less memory usage.

lsh, e2lsh, euclidean-distance, count-sketch, hcs, higher-order-count-sketch, hashing, similarity, locality-sensitive-hashing, vector-similarity, embeddings, fingerprinting, fast-search, memory-efficient, recommendations, product-similarity, image-similarity, text-similarity, audio-fingerprinting, high-dimensional, approximate-similarity, microsecond-search, distance-estimation, euclidean-similarity, vector-compression, fast-approximation

readme

DistanceSensitiveHash - Lightning-Fast Euclidean Distance Search

npm package

Find similar items in microseconds. This library compresses high-dimensional vectors (embeddings, features, user preferences) into tiny fingerprints that let you estimate Euclidean similarity 9× faster while using 94% less memory than naive Euclidean distance comparison. Perfect for rapidly narrowing a set of options, and edge deployments.

What's the Problem?

You have 100,000 high-dimensional vectors and need to find similar ones fast. Works for anything vector-shaped:

  • Product feature vectors
  • Image / text embeddings
  • User preference vectors
  • Audio fingerprints
  • Time series data

How It Works (30-Second Version)

  1. Feed your vector into encodeE2LSH() → get a compact integer signature (16-512 integers)
  2. Compare signatures with estimateSimilarity() → Euclidean similarity in microseconds
  3. Profit

Trade-off: 5-15% error for orders of magnitude speed-up and massive memory savings.

Bundle Size & Performance

Tiny footprint: Core library is only ~8KB (gzipped: ~3KB).

Works everywhere:

  • Node.js (v14+)
  • Browser (ES modules)
  • Web Workers (thread-safe)

Quick Start

npm install distance-sensitive-hash
import { encodeE2LSH, estimateSimilarity } from 'distance-sensitive-hash';

const productA = [0.8, 0.2, 0.9, /*...512 dims...*/ 0.1, 0.7];
const productB = [0.7, 0.3, 0.8, /*...512 dims...*/ 0.2, 0.6];

// 1. Hash once (64 integers)
const fpA = encodeE2LSH(productA, 64);
const fpB = encodeE2LSH(productB, 64);

// 2. Compare in <1 μs
const sim = estimateSimilarity(fpA, fpB);
console.log('Similarity ≈', sim); // 0.0 – 1.0

Two Algorithms, One API

Name Function When to Pick
CS-E2LSH (default) encodeCSE2LSH() 3× faster encoding (recommended)
HCS-E2LSH encodeHCSE2LSH() Better accuracy for medium distances

Pick based on your use case - CS for speed, HCS for better distance estimation.

API Cheatsheet

encodeE2LSH(vector, bits, seed = 42, variant = 'cs', w = 4.0, order = 2)
// vector: number[]           – your data
// bits: int                  – signature size (16-512 recommended)
// seed: int                  – reproducibility
// variant: string            – 'cs' or 'hcs'
// w: float                   – width parameter (4.0 default)
// order: int                 – tensor order for HCS (2-4)
// returns: Int32Array        – compact fingerprint

estimateSimilarity(fpA, fpB, w = 4)
// fpA, fpB: Int32Array       – fingerprints
// w: float                   – width parameter used in encoding
// returns: float 0-1         – Euclidean similarity estimate

Benchmarks

Task Ops/sec vs Vanilla Accuracy
CS-E2LSH encode (512d→64-int) 66,701 sigs/sec 85-95%
CS-E2LSH compare 8.7M ops/sec 9.5× faster ±0.15 @ ≥0.7
HCS-E2LSH encode (512d→64-int) 7,857 sigs/sec 85-95%
HCS-E2LSH compare 8.8M ops/sec 9.6× faster ±0.13 @ ≥0.7
Vanilla Euclidean 915k ops/sec baseline exact

Performance by Signature Size

Size Generation Comparison Memory Accuracy
16 50.8k/s 9.3M/s 64B ±0.10
32 81.5k/s 14.6M/s 128B ±0.13
64 66.7k/s 8.7M/s 256B ±0.15
128 46.4k/s 4.0M/s 512B ±0.13
256 15.5k/s 2.0M/s 1KB ±0.09

Distance Accuracy Examples

True Distance CS-E2LSH Error HCS-E2LSH Error
0.377 (close) ±0.064 (17%) ±0.171 (45%)
1.405 (medium) ±0.543 (39%) ±0.318 (23%)
3.443 (far) ±0.466 (14%) ±0.231 (7%)

Memory Savings

Massive reduction compared to full vector storage:

Vector Size Full Vector 64-int Signature Reduction
512-d float 2,048 bytes 256 bytes 8× smaller
1024-d float 4,096 bytes 256 bytes 16× smaller
2048-d float 8,192 bytes 256 bytes 32× smaller

Real-world impact:

  • 1M product vectors (512-d): 2GB → 256MB (87.5% reduction)
  • 10M user vectors (1024-d): 40GB → 2.5GB (93.75% reduction)
  • 100M image vectors (2048-d): 800GB → 25GB (96.9% reduction)

When to Use / Not Use

Use for:

  • Large datasets (≥10k items)
  • Real-time recommendations
  • Memory-constrained environments
  • Approximate nearest neighbor search
  • Pre-filtering before exact comparisons

Don't use for:

Academic Roots

Implements Count-Sketch E2LSH (CS-E2LSH) and Higher-Order Count-Sketch E2LSH (HCS-E2LSH) from:

Verma & Pratap, Improving E2LSH with Count-Sketch and Higher-Order Count-Sketch, arXiv:2503.06737

Built on the classic LSH framework (Gionis, Indyk, Motwani, VLDB 1999) and Euclidean LSH (Datar et al., SoCG 2004).

License

MIT