Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

@ruvector/ruvllm

ruvnet2.2kMIT OR Apache-2.02.4.1TypeScript support: included

Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, RLM recursive retrieval, FastGRNN routing, and SIMD inference

ruvllm, llm, rlm, retrieval-language-model, self-learning, adaptive-learning, sona, lora, ewc, hnsw, vector-database, fastgrnn, router, simd, inference, federated-learning, continual-learning, machine-learning, ai, deep-learning, napi, rust, ruvector, rag, contrastive-learning

readme

@ruvector/ruvllm

The First Purpose-Built LLM Runtime for Claude Code Agent Orchestration

100% Routing Accuracy | Sub-Millisecond Inference | Self-Learning

npm Downloads License Tests

Quick Start | RLM | Training | Models | API


What is @ruvector/ruvllm?

@ruvector/ruvllm is a TypeScript/JavaScript SDK for intelligent LLM orchestration, specifically designed for Claude Code and multi-agent systems. It provides:

  • RLM (Recursive Language Model) - Break complex queries into sub-queries, synthesize coherent answers
  • 100% Routing Accuracy - Hybrid keyword + embedding strategy for perfect agent selection
  • SONA Self-Learning - Model improves with every successful interaction
  • SIMD Acceleration - AVX2/NEON optimized inference

Why @ruvector/ruvllm?

Challenge Traditional Approach @ruvector/ruvllm Solution
Agent selection Manual or keyword-based Semantic + keyword hybrid = 100%
Complex queries Single-shot RAG Recursive decomposition + synthesis
Response latency 2-5 seconds <1ms cache, 50-200ms full
Learning Static models Self-improving (SONA)
Cost per route $0.01+ (API call) $0 (local inference)

Installation

npm install @ruvector/ruvllm

Quick Start

import { RuvLLM, RlmController } from '@ruvector/ruvllm';

// Simple LLM inference
const llm = new RuvLLM({
  modelPath: '~/.ruvllm/models/ruvltra-claude-code-0.5b-q4_k_m.gguf',
  sonaEnabled: true,
});

const response = await llm.query('Explain quantum computing');
console.log(response.text);

// Recursive Language Model for complex queries
const rlm = new RlmController({ maxDepth: 5 });
const answer = await rlm.query('What are the causes AND solutions for slow API responses?');
// Automatically decomposes into sub-queries, retrieves context, synthesizes answer

Core Features

1. Claude Code Native Routing

Built by Claude Code, for Claude Code. Routes tasks to 60+ agent types:

import { RuvLLM } from '@ruvector/ruvllm';

const llm = new RuvLLM({ model: 'ruv/ruvltra' });

// Intelligent routing
const route = await llm.route('implement OAuth2 authentication');
console.log(route.agent);      // 'security-architect'
console.log(route.confidence); // 0.98
console.log(route.tier);       // 2 (Haiku-level complexity)

// Multi-agent teams for complex tasks
const team = await llm.routeComplex('build full-stack app with auth');
// Returns: [system-architect, backend-dev, coder, security-architect, tester]

2. 3-Tier Intelligent Routing

┌─────────────────────────────────────────────────────────┐
│                    User Request                         │
└─────────────────────┬───────────────────────────────────┘
                      ↓
              [RuvLTRA Routing]
                      ↓
        ┌─────────────┼─────────────┐
        ↓             ↓             ↓
┌───────────┐  ┌───────────┐  ┌───────────┐
│  Tier 1   │  │  Tier 2   │  │  Tier 3   │
│  Booster  │  │   Haiku   │  │   Opus    │
│   <1ms    │  │  ~500ms   │  │   2-5s    │
│    $0     │  │  $0.0002  │  │  $0.015   │
└───────────┘  └───────────┘  └───────────┘

3. Self-Learning (SONA)

Every successful interaction improves the model:

// First routing: Full inference
llm.route('implement OAuth2') → security-architect (97%)

// Later: Pattern hit in <25μs (learned from success)
llm.route('add OAuth2 flow') → security-architect (99%, cached pattern)

RLM (Recursive Language Model)

RLM provides recursive query decomposition - unlike traditional RAG that retrieves once, RLM breaks complex questions into sub-queries and synthesizes coherent answers.

How It Works

Query: "What are the causes AND solutions for slow API responses?"
                              ↓
                    [Decomposition]
                    /            \
    "Causes of slow API?"    "Solutions for slow API?"
           ↓                        ↓
    [Sub-answers]            [Sub-answers]
           \                        /
                    [Synthesis]
                         ↓
            Coherent combined answer with sources

Basic Usage

import { RlmController } from '@ruvector/ruvllm';

const rlm = new RlmController({
  maxDepth: 5,
  retrievalTopK: 10,
  enableCache: true,
});

// Add knowledge to memory
await rlm.addMemory('TypeScript adds static typing to JavaScript.');
await rlm.addMemory('React is a library for building user interfaces.');

// Query with recursive retrieval
const answer = await rlm.query('What are causes and solutions for type errors in React?');
console.log(answer.text);           // Comprehensive synthesized answer
console.log(answer.sources);        // Source attributions
console.log(answer.qualityScore);   // 0.0-1.0
console.log(answer.confidence);     // Routing confidence

Streaming

for await (const event of rlm.queryStream('Explain machine learning')) {
  if (event.type === 'token') {
    process.stdout.write(event.text);
  } else {
    console.log('\n\nQuality:', event.answer.qualityScore);
  }
}

With Self-Reflection

const rlm = new RlmController({
  enableReflection: true,
  maxReflectionIterations: 2,
  minQualityScore: 0.8,
});

// Answers are iteratively refined until quality >= 0.8
const answer = await rlm.query('Complex multi-part technical question...');

RLM Configuration

interface RlmConfig {
  maxDepth?: number;              // Max recursion depth (default: 3)
  maxSubQueries?: number;         // Max sub-queries per level (default: 5)
  tokenBudget?: number;           // Token budget (default: 4096)
  enableCache?: boolean;          // Enable caching (default: true)
  cacheTtl?: number;              // Cache TTL in ms (default: 300000)
  retrievalTopK?: number;         // Memory spans to retrieve (default: 10)
  minQualityScore?: number;       // Min quality threshold (default: 0.7)
  enableReflection?: boolean;     // Enable self-reflection (default: false)
  maxReflectionIterations?: number; // Max reflection loops (default: 2)
}

Unique Capabilities

1. Memory-Augmented Routing

Every successful routing is stored in HNSW-indexed memory for instant recall:

// First time: Full inference (~50ms)
route("implement OAuth2") → security-architect (97% confidence)

// Later: Memory hit (<25μs)
route("add OAuth2 flow") → security-architect (99% confidence, cached)

2. Confidence-Aware Escalation

// Low confidence automatically escalates
Confidence > 0.9  → Use recommended agent
Confidence 0.7-0.9 → Use with human confirmation
Confidence < 0.7  → Escalate to higher tier

3. Batch SIMD Operations

import { simd } from '@ruvector/ruvllm/simd';

// 4x faster vector operations with AVX2/NEON
const similarity = simd.batchCosineSimilarity(query, targets);
const attended = simd.flashAttention(q, k, v, scale);

4. Zero-Copy Caching

Arc-based string interning for 100-1000x faster cache hits on large responses.


Performance

Benchmarks (M4 Pro)

Operation Latency Throughput
Query decomposition 340 ns 2.9M/s
Cache lookup 23.5 ns 42.5M/s
Embedding (384d) 293 ns 3.4M/s
Memory search (10k) 0.4 ms 2.5K/s
End-to-end routing <1 ms 1K+/s
Full RLM query 50-200 ms 5-20/s

Routing Accuracy

Strategy RuvLTRA Qwen Base OpenAI
Embedding Only 45% 40% 52%
Keyword Only 78% 78% N/A
Hybrid 100% 95% N/A

Test Results

145 tests passing
  - RLM Controller: 24 tests
  - Routing Accuracy: 18 tests
  - Contrastive Training: 15 tests
  - SIMD Operations: 22 tests
  - SONA Learning: 19 tests
  - Memory/HNSW: 21 tests
  - Benchmarks: 26 tests

Models

HuggingFace Repository

URL: https://huggingface.co/ruv/ruvltra

Available Models

Model Size Purpose Accuracy
ruvltra-claude-code-0.5b-q4_k_m 398 MB Agent routing 100% (hybrid)
ruvltra-small-0.5b-q4_k_m ~400 MB Embeddings -
ruvltra-medium-1.1b-q4_k_m ~1 GB Full inference -

Download Models

// Programmatic
import { downloadModel } from '@ruvector/ruvllm';
await downloadModel('ruv/ruvltra', { quantization: 'q4_k_m' });

// CLI
ruvllm download ruv/ruvltra

Auto-Download

Models are automatically downloaded on first use:

const llm = new RuvLLM({ model: 'ruv/ruvltra' });
// Downloads to ~/.ruvllm/models/ if not present

Training

Generate Routing Dataset

node scripts/training/routing-dataset.js
# Output: 381 examples, 793 contrastive pairs, 156 hard negatives

Contrastive Fine-tuning

import { ContrastiveTrainer } from '@ruvector/ruvllm';

const trainer = new ContrastiveTrainer({
  modelPath: './models/base.gguf',
  loraRank: 8,
  loraAlpha: 16,
  learningRate: 1e-4,
});

const pairs = [
  { anchor: 'Fix auth bug', positive: 'coder', negative: 'researcher' },
  // ... more pairs
];

await trainer.train(pairs, { epochs: 10 });
await trainer.save('./adapters/routing-lora');

Training Scripts

Script Description
routing-dataset.js Generate 381 routing examples
claude-code-synth.js Synthetic data generation
contrastive-finetune.js LoRA fine-tuning pipeline
rlm-dataset.js RLM training data (500 examples)

API Reference

RuvLLM Class

class RuvLLM {
  constructor(config?: RuvLLMConfig);

  query(prompt: string, params?: GenerateParams): Promise<Response>;
  stream(prompt: string, params?: GenerateParams): AsyncIterable<string>;
  route(task: string): Promise<RoutingResult>;
  routeComplex(task: string): Promise<AgentTeam[]>;

  loadModel(path: string): Promise<void>;
  addMemory(text: string, metadata?: object): number;
  searchMemory(query: string, topK?: number): MemoryResult[];

  sonaStats(): SonaStats | null;
  adapt(input: Float32Array, quality: number): void;
}

RlmController Class

class RlmController {
  constructor(config?: RlmConfig, engine?: RuvLLM);

  query(input: string): Promise<RlmAnswer>;
  queryStream(input: string): AsyncGenerator<StreamToken>;

  addMemory(text: string, metadata?: object): Promise<string>;
  searchMemory(query: string, topK?: number): Promise<MemorySpan[]>;

  clearCache(): void;
  getCacheStats(): { size: number; entries: number };

  updateConfig(config: Partial<RlmConfig>): void;
  getConfig(): Required<RlmConfig>;
}

All Exports

import {
  // Core
  RuvLLM, RuvLLMConfig,

  // RLM
  RlmController, RlmConfig, RlmAnswer, MemorySpan, StreamToken,

  // Training
  RlmTrainer, ContrastiveTrainer, createRlmTrainer,
  DEFAULT_RLM_CONFIG, FAST_RLM_CONFIG, THOROUGH_RLM_CONFIG,

  // SONA Learning
  SonaCoordinator, TrajectoryBuilder,

  // LoRA
  LoraAdapter, LoraManager,

  // Benchmarks
  ModelComparisonBenchmark, RoutingBenchmark, EmbeddingBenchmark,
} from '@ruvector/ruvllm';

CLI

# Route a task
ruvllm route "add unit tests for auth module"
# → Agent: tester | Confidence: 0.96 | Tier: 2

# Query with streaming
ruvllm query --stream "Explain machine learning"

# Download models
ruvllm download ruv/ruvltra

# Run benchmarks
ruvllm bench ./models/model.gguf

# Evaluate (SWE-Bench)
ruvllm eval --model ./models/model.gguf --subset lite

Platform Support

Platform Architecture Status
macOS arm64 (M1-M4) Full support
macOS x64 Supported
Linux x64 Supported
Linux arm64 Supported
Windows x64 Supported

Resource URL
npm npmjs.com/package/@ruvector/ruvllm
HuggingFace huggingface.co/ruv/ruvltra
Crate (Rust) crates.io/crates/ruvllm
Documentation docs.rs/ruvllm
GitHub github.com/ruvnet/ruvector
Claude Flow github.com/ruvnet/claude-flow

License

MIT OR Apache-2.0


Built for Claude Code. Optimized for agents. Designed for speed.

Get Started | View on GitHub