LLM Pool Manager
A production-ready, fault-tolerant Node.js library for managing multiple LLM API providers with intelligent load balancing, automatic failover, and dynamic configuration management.
Features
🚀 Multi-Provider Support
- OpenAI GPT models
- Anthropic Claude models
- Groq models
- Together AI models
- Cohere models
- Easy to extend for new providers
⚖️ Intelligent Load Balancing
- Priority-based provider selection
- Success rate tracking
- Response time optimization
- Circuit breaker pattern
🔄 Automatic Failover
- Seamless switching between providers
- Configurable retry logic with exponential backoff
- Rate limit detection and handling
📊 Advanced Rate Limiting
- Per-minute and per-day request limits
- Dynamic rate limit detection from API responses
- Intelligent request distribution
🛡️ Fault Tolerance
- Circuit breaker for failing providers
- Request timeout handling
- Network error recovery
- Provider health monitoring
⚙️ Dynamic Configuration
- Hot-reload configuration changes
- Local file or remote URL configuration
- Configuration validation and checksums
- Zero-downtime updates
🖼️ Multi-Modal Support
- Text and image message handling
- Base64 image support
- Provider-specific format conversion
📈 Comprehensive Monitoring
- Real-time provider statistics
- Cost tracking with token pricing
- Performance metrics
- Health checks and alerts
Installation
npm install llmpool
Quick Start
1. Create Configuration
Create a config.json
file:
{
"providers": [
{
"name": "groq-primary",
"type": "groq",
"api_key": "your-groq-api-key",
"base_url": "https://api.groq.com/openai/v1",
"model": "mixtral-8x7b-32768",
"priority": 1,
"requests_per_minute": 30,
"requests_per_day": 1000
},
{
"name": "openai-fallback",
"type": "openai",
"api_key": "your-openai-api-key",
"base_url": "https://api.openai.com/v1",
"model": "gpt-4",
"priority": 2,
"requests_per_minute": 100,
"requests_per_day": 5000
}
]
}
2. Basic Usage
const { LLMPool, createTextMessage } = require('llmpool');
async function main() {
// Initialize pool
const pool = new LLMPool({
configPath: './config.json'
});
await pool.initialize();
// Send chat request
const response = await pool.chat({
messages: [
createTextMessage('system', 'You are a helpful assistant.'),
createTextMessage('user', 'What is the capital of France?')
],
temperature: 0.7,
max_tokens: 1000
});
console.log('Response:', response.content);
console.log('Provider:', response.provider);
console.log('Tokens used:', response.usage.total_tokens);
await pool.shutdown();
}
main().catch(console.error);
Advanced Usage
Image Support
const { createImageMessage } = require('llmpool');
const response = await pool.chat({
messages: [
createImageMessage(
'user',
'What do you see in this image?',
'...'
)
]
});
Remote Configuration
const pool = new LLMPool({
configUrl: 'https://your-domain.com/llm-config.json',
checkInterval: 300000 // Check for updates every 5 minutes
});
pool.on('configChanged', (config) => {
console.log('Configuration updated automatically');
});
Event Monitoring
pool.on('requestSuccess', (event) => {
console.log(`✅ ${event.provider} succeeded on attempt ${event.attempt}`);
});
pool.on('requestError', (event) => {
console.log(`❌ ${event.provider} failed: ${event.error}`);
});
pool.on('providersUpdated', (providers) => {
console.log(`Updated ${providers.length} providers`);
});
Health Monitoring
// Get overall pool health
const health = pool.getPoolHealth();
console.log(`Available: ${health.availableProviders}/${health.totalProviders}`);
// Get detailed provider statistics
const stats = pool.getProviderStats();
Object.entries(stats).forEach(([name, stat]) => {
console.log(`${name}:`);
console.log(` Success Rate: ${stat.performance.successRate.toFixed(2)}%`);
console.log(` Avg Response Time: ${stat.performance.averageResponseTime}ms`);
console.log(` Total Cost: $${stat.usage.totalCost.toFixed(4)}`);
});
Configuration Reference
Pool Configuration
const pool = new LLMPool({
// Configuration source (choose one)
configPath: './config.json', // Local file path
configUrl: 'https://example.com/config.json', // Remote URL
// Behavior settings
timeout: 30000, // Request timeout (ms)
maxRetries: 3, // Maximum retry attempts
retryDelay: 1000, // Initial retry delay (ms)
checkInterval: 300000, // Config check interval (ms)
useTokenCounting: true // Enable token estimation
});
Provider Configuration
{
"name": "provider-name", // Unique identifier
"type": "openai", // Provider type
"api_key": "your-api-key", // API authentication
"base_url": "https://api.openai.com/v1",
"model": "gpt-4", // Model to use
"priority": 1, // Selection priority (lower = higher priority)
// Rate limiting
"requests_per_minute": 100, // RPM limit
"requests_per_day": 5000, // Daily limit
// Circuit breaker
"circuit_breaker_threshold": 5, // Failure threshold
"circuit_breaker_timeout": 60000, // Recovery timeout (ms)
// Request defaults
"max_tokens": 4096, // Default max tokens
"temperature": 0.7, // Default temperature
"timeout": 30000, // Request timeout (ms)
// Cost tracking (optional)
"input_token_price": 0.03, // Cost per 1K input tokens
"output_token_price": 0.06 // Cost per 1K output tokens
}
Supported Provider Types
Provider | Type | Base URL |
---|---|---|
OpenAI | openai |
https://api.openai.com/v1 |
Gemini | gemini |
https://generativelanguage.googleapis.com/v1beta/openai |
Anthropic | anthropic |
https://api.anthropic.com/v1 |
Groq | groq |
https://api.groq.com/openai/v1 |
Together AI | together |
https://api.together.xyz/v1 |
Cohere | cohere |
https://api.cohere.ai/v1 |
Error Handling
The library provides specific error types for different scenarios:
const {
ProviderError,
RateLimitError,
ConfigurationError
} = require('llmpool');
try {
const response = await pool.chat({ messages });
} catch (error) {
if (error instanceof RateLimitError) {
console.log(`Rate limited by ${error.provider}, retry in ${error.resetTime}s`);
} else if (error instanceof ProviderError) {
console.log(`Provider ${error.provider} failed: ${error.message}`);
if (error.retryable) {
// Can retry with different provider
}
} else if (error instanceof ConfigurationError) {
console.log(`Configuration issue: ${error.message}`);
}
}
Testing
Run the test suite:
npm test
Run specific test categories:
# Unit tests only
npm test -- --testNamePattern="LLMPool|Provider|ConfigManager"
# Integration tests
npm test -- --testNamePattern="Integration"
# Performance tests
npm test -- --testNamePattern="Performance"
Performance Considerations
Concurrent Requests
The pool handles concurrent requests efficiently:
// Process multiple requests simultaneously
const promises = requests.map(request =>
pool.chat({ messages: request.messages })
);
const results = await Promise.allSettled(promises);
Memory Usage
- Provider statistics are kept in memory with configurable history limits
- Token counting uses efficient algorithms when enabled
- Configuration changes don't cause memory leaks
Optimization Tips
- Set appropriate priorities - Put faster/cheaper providers first
- Configure realistic rate limits - Match provider specifications
- Use circuit breakers - Prevent cascading failures
- Monitor health regularly - Detect issues early
- Cache configurations - Reduce remote config fetches
Security Best Practices
API Key Management
// Use environment variables
const config = {
providers: [{
name: 'openai',
type: 'openai',
api_key: process.env.OPENAI_API_KEY,
// ... other config
}]
};
Request Validation
All requests are validated before sending:
- Message format validation
- Content length checks
- Parameter sanitization
- Provider capability verification
Network Security
- HTTPS-only connections
- Request timeout protection
- Retry limit enforcement
- Error message sanitization
Monitoring and Observability
Metrics Collection
// Set up periodic monitoring
setInterval(() => {
const health = pool.getPoolHealth();
const stats = pool.getProviderStats();
// Log metrics to your monitoring system
console.log('Pool Health:', health);
// Alert on issues
if (!health.healthy) {
console.warn('🚨 Pool unhealthy - no available providers');
}
Object.entries(stats).forEach(([name, stat]) => {
if (stat.performance.successRate < 90) {
console.warn(`⚠️ ${name} has low success rate: ${stat.performance.successRate}%`);
}
});
}, 30000);
Integration with Monitoring Tools
The library emits structured events that can be integrated with monitoring tools:
// Prometheus metrics example
pool.on('requestSuccess', (event) => {
prometheus.requestsTotal
.labels({ provider: event.provider, status: 'success' })
.inc();
});
pool.on('requestError', (event) => {
prometheus.requestsTotal
.labels({ provider: event.provider, status: 'error' })
.inc();
});
Troubleshooting
Common Issues
No available providers
- Check provider configurations
- Verify API keys are valid
- Check rate limits haven't been exceeded
- Ensure network connectivity
High failure rates
- Review circuit breaker thresholds
- Check provider status pages
- Verify request formats
- Monitor network timeouts
Configuration not updating
- Verify remote URL accessibility
- Check file permissions for local configs
- Review checkInterval setting
- Monitor configChanged events
Debug Mode
Enable verbose logging:
const pool = new LLMPool({
configPath: './config.json',
debug: true
});
pool.on('debug', (message) => {
console.log('DEBUG:', message);
});
Health Checks
Implement regular health checks:
async function healthCheck() {
const health = pool.getPoolHealth();
if (!health.healthy) {
throw new Error('LLM Pool is unhealthy');
}
return {
status: 'healthy',
providers: health.availableProviders,
total: health.totalProviders
};
}
Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Development Setup
git clone https://github.com/KTBsomen/llmpool.git
cd llmpool
npm install
npm test
License
MIT License - see LICENSE file for details.
Changelog
v1.0.0
- Initial release
- Multi-provider support
- Dynamic configuration
- Circuit breaker implementation
- Comprehensive test suite
- Production-ready error handling
For more examples and advanced usage patterns, see the examples directory.