large-models-interface
Maintained by chenxingqiang
Introduction
Large Models Interface is a comprehensive npm module designed to streamline interactions with various AI model providers in your Node.js applications. Our mission is to provide a unified interface for all types of large models, making it simple to switch between providers and leverage the best models for your specific needs.
🎯 Our Vision: Universal access to all kinds of large AI models through a single, consistent interface.
🇨🇳 Special Focus on Chinese AI Ecosystem: We prioritize comprehensive support for leading Chinese AI providers including Baidu, Alibaba, ByteDance, Tencent, iFLYTEK, and emerging players, making this the most China-friendly international AI interface.
🚀 Multi-Modal AI Support
We are building the most comprehensive interface for modern AI models:
- 🗣️ Natural Language Models - Chat completion, text generation, and language understanding
- 🖼️ Vision Models - Image analysis, generation, and vision-language tasks
- 🎵 Audio Models - Speech recognition, synthesis, and audio processing
- 🎬 Video Models - Video analysis, generation, and multimodal video understanding
- 🧠 Specialized Models - Code generation, embeddings, and domain-specific AI
The Large Models Interface package currently offers comprehensive support for 51 language model providers and hundreds of models, with active development to expand into all AI modalities. This extensive and growing coverage ensures maximum flexibility in choosing the best models for your applications.
🌟 Current Support: 51 Providers & Hundreds of Models
🗣️ Natural Language Models (Current)
🌍 Global Leading Providers
International: OpenAI, Anthropic, Google Gemini, Mistral AI, Groq, DeepSeek, Hugging Face, NVIDIA AI, xAI, Coze, and 30+ more providers.
Supported Global Providers: AI21 Studio, AiLAYER, AIMLAPI, Anyscale, Anthropic, Cloudflare AI, Cohere, Corcel, Coze, DeepInfra, DeepSeek, Fireworks AI, Forefront AI, FriendliAI, Google Gemini, GooseAI, Groq, Hugging Face Inference, HyperBee AI, Lamini, LLaMA.CPP, Mistral AI, Monster API, Neets.ai, Novita AI, NVIDIA AI, OctoAI, Ollama, OpenAI, Perplexity AI, Reka AI, Replicate, Shuttle AI, SiliconFlow, TheB.ai, Together AI, Voyage AI, Watsonx AI, Writer, xAI, and Zhipu AI.
🇨🇳 Chinese AI Ecosystem
Leading Chinese Providers: 百度文心一言 (Baidu ERNIE), 阿里通义千问 (Alibaba Qwen), 字节跳动豆包 (ByteDance Doubao), 讯飞星火 (iFLYTEK Spark), 智谱 ChatGLM, 腾讯混元 (Tencent Hunyuan), and more.
Chinese Providers (已支持/Currently Supported):
- 百度文心一言系列模型 - Baidu ERNIE Series ✅
- 阿里通义千问系列模型 - Alibaba Qwen Series ✅
- 字节跳动豆包大模型 - ByteDance Doubao (Volcano Engine) ✅
- 讯飞星火认知大模型 - iFLYTEK Spark Cognitive Model ✅
- 智谱 ChatGLM 系列模型 - Zhipu ChatGLM Series ✅
- 腾讯混元大模型 - Tencent Hunyuan ✅
- Moonshot AI - 月之暗面 ✅
- 百川大模型 - Baichuan AI ✅
- MINIMAX - MiniMax Models ✅
- 零一万物 - 01.AI (Yi Series) ✅
- 阶跃星辰 - StepFun ✅
- 硅基流动 SiliconCloud - SiliconFlow ✅
🚧 Coming Soon: Multi-Modal Expansion
- 🖼️ Vision Models - Image understanding, OCR, visual question answering
- 🎵 Audio Models - Speech-to-text, text-to-speech, audio generation
- 🎬 Video Models - Video analysis, captioning, generation
- 🧠 Specialized Models - Code completion, scientific computing, domain-specific AI
Our roadmap includes expanding across all AI modalities, with dynamic model discovery to automatically support the latest releases.
✨ Core Features
🎯 Universal AI Interface
- Unified API:
LLMInterface.sendMessage
provides a single, consistent interface to interact with 51 AI model providers - Multi-Modal Ready: Designed to support text, vision, audio, and video models through the same interface
- Dynamic Model Discovery: Automatically detects and supports newly released models without code updates
- 🇨🇳 China-First Design: Comprehensive support for Chinese AI ecosystem with native language examples and documentation
🚀 Advanced Capabilities
- Chat Completion & Streaming: Full support for chat completion, streaming, and embeddings with intelligent failover
- Smart Model Selection: Automatically choose the best model based on task type and requirements
- Response Caching: Intelligent caching system to reduce costs and improve performance
- Graceful Error Handling: Robust retry mechanisms with exponential backoff
🔧 Developer Experience
- Dynamic Module Loading: Lazy loading of provider interfaces to minimize resource usage
- JSON Output & Repair: Native JSON output support with automatic repair for malformed responses
- Extensible Architecture: Easy integration of new providers and model types
- Type Safety: Full TypeScript support for better development experience
🌐 Future-Ready Architecture
- Modality Expansion: Built to seamlessly integrate vision, audio, and video models
- Provider Agnostic: Switch between providers without changing your application code
- Auto-Discovery: Continuously updated model registry for the latest AI capabilities
Dependencies
The project relies on several npm packages and APIs. Here are the primary dependencies:
axios
: For making HTTP requests (used for various HTTP AI APIs).@google/generative-ai
: SDK for interacting with the Google Gemini API.dotenv
: For managing environment variables. Used by test cases.jsonrepair
: Used to repair invalid JSON responses.loglevel
: A minimal, lightweight logging library with level-based logging and filtering.
The following optional packages can added to extend LLMInterface's caching capabilities:
flat-cache
: A simple JSON based cache.cache-manager
: An extendible cache module that supports various backends including Redis, MongoDB, File System, Memcached, Sqlite, and more.
Installation
To install the LLM Interface npm module, you can use npm:
npm install large-models-interface
Quick Start
- Looking for API Keys? This document provides helpful links.
- Detailed usage documentation is available here.
- Various examples are also available to help you get started.
- A breakdown of model aliases is available here.
- A breakdown of embeddings model aliases is available here.
- If you still want more examples, you may wish to review the test cases for further examples.
Usage
First import LLMInterface
. You can do this using either the CommonJS require
syntax:
const { LLMInterface } = require('large-models-interface');
🌍 Global Providers Example
LLMInterface.setApiKey({ openai: process.env.OPENAI_API_KEY });
try {
const response = await LLMInterface.sendMessage(
'openai',
'Explain the importance of low latency LLMs.',
);
} catch (error) {
console.error(error);
}
🇨🇳 Chinese Providers Example
// 智谱 ChatGLM
LLMInterface.setApiKey({ zhipuai: process.env.ZHIPUAI_API_KEY });
const response = await LLMInterface.sendMessage(
'zhipuai',
'请解释大语言模型在中文自然语言处理中的重要性',
{ model: 'glm-4' }
);
// 百度文心一言
LLMInterface.setApiKey({ baidu: process.env.BAIDU_API_KEY });
const response = await LLMInterface.sendMessage(
'baidu',
'请帮我写一段关于人工智能发展的文章',
{ model: 'ernie-4.0-8k' }
);
// 阿里通义千问
LLMInterface.setApiKey({ alibaba: process.env.ALIBABA_API_KEY });
const response = await LLMInterface.sendMessage(
'alibaba',
'请介绍一下人工智能的发展历程',
{ model: 'qwen-turbo' }
);
if you prefer, you can pass use a one-liner to pass the provider and API key, essentially skipping the LLMInterface.setApiKey() step.
const response = await LLMInterface.sendMessage(
['openai', process.env.OPENAI_API_KEY],
'Explain the importance of low latency LLMs.',
);
Passing a more complex message object is just as simple. The same rules apply:
const message = {
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain the importance of low latency LLMs.' },
],
};
try {
const response = await LLMInterface.sendMessage('openai', message, {
max_tokens: 150,
});
} catch (error) {
console.error(error);
}
LLMInterfaceSendMessage and LLMInterfaceStreamMessage are still available and will be available until version 3
Running Tests
The project includes tests for each LLM handler. To run the tests, use the following command:
npm test
The comprehensive test suite covers all 51 providers with proper API key validation and graceful skipping when credentials are not available.
🗓️ Roadmap
✅ Phase 1: Enhanced Language Models (Completed)
- <input checked="" disabled="" type="checkbox"> Dynamic Model Discovery - Auto-detect latest models from all providers
- <input checked="" disabled="" type="checkbox"> Chinese AI Providers Integration:
- <input checked="" disabled="" type="checkbox"> 百度文心一言 (Baidu ERNIE) - ERNIE-4.0, ERNIE-3.5 series
- <input checked="" disabled="" type="checkbox"> 阿里通义千问 (Alibaba Qwen) - Qwen2.5, Qwen-Turbo, Qwen-Plus
- <input checked="" disabled="" type="checkbox"> 字节跳动豆包 (ByteDance Doubao) - Doubao-pro, Doubao-lite series
- <input checked="" disabled="" type="checkbox"> 讯飞星火 (iFLYTEK Spark) - Spark-4.0, Spark-3.5 models
- <input checked="" disabled="" type="checkbox"> 腾讯混元 (Tencent Hunyuan) - Hunyuan-large, Hunyuan-pro
- <input checked="" disabled="" type="checkbox"> 月之暗面 (Moonshot AI) - Moonshot-v1 series
- <input checked="" disabled="" type="checkbox"> 百川大模型 (Baichuan AI) - Baichuan2 series
- <input checked="" disabled="" type="checkbox"> 零一万物 (01.AI) - Yi-34B, Yi-6B series
- <input checked="" disabled="" type="checkbox"> 阶跃星辰 (StepFun) - Step-1V, Step-2 models
- <input checked="" disabled="" type="checkbox"> New Global Providers - xAI Grok, SiliconFlow, Coze
- <input checked="" disabled="" type="checkbox"> Enhanced Embeddings - Voyage AI, improved embedding support
🖼️ Phase 2: Vision Models (Next)
- <input disabled="" type="checkbox"> Image Understanding - GPT-4V, Claude Vision, Gemini Vision
- <input disabled="" type="checkbox"> Image Generation - DALL-E, Midjourney, Stable Diffusion
- <input disabled="" type="checkbox"> OCR & Document AI - Advanced document processing capabilities
- <input disabled="" type="checkbox"> Visual Question Answering - Multi-modal reasoning
🎵 Phase 3: Audio Models (Future)
- <input disabled="" type="checkbox"> Speech Recognition - Whisper, Azure Speech, Google Speech-to-Text
- <input disabled="" type="checkbox"> Text-to-Speech - ElevenLabs, Azure TTS, OpenAI TTS
- <input disabled="" type="checkbox"> Audio Generation - Music generation, sound effects
- <input disabled="" type="checkbox"> Real-time Audio - Streaming audio processing
🎬 Phase 4: Video & Advanced AI (Future)
- <input disabled="" type="checkbox"> Video Understanding - Video analysis, captioning, content moderation
- <input disabled="" type="checkbox"> Video Generation - AI video creation and editing
- <input disabled="" type="checkbox"> Multi-modal Reasoning - Cross-modal understanding and generation
- <input disabled="" type="checkbox"> Specialized AI - Scientific computing, code generation, domain-specific models
🚀 Submit your feature requests and suggestions!
Contribute
Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes or improvements.
Acknowledgments
This project is based on and extends the excellent llm-interface project. We thank the original authors for their foundational work.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author: chenxingqiang
GitHub: chenxingqiang