zh-chardet
Chinese character encoding detection for GB18030, Big5, UTF-8, and UTF-16.
Features
- Detects GB18030, Big5, UTF-8, UTF-16LE, and UTF-16BE encodings
- Works in both Node.js and browsers
- TypeScript support included
- Confidence scoring for detection results
- Multiple detection result options
Installation
npm install zh-chardetUsage
Basic Usage
import { detect, detectBest, detectEncoding } from 'zh-chardet';
// Get all possible encodings with confidence scores
const results = detect('你好世界');
console.log(results);
// [{ encoding: 'UTF-8', confidence: 0.90 }, ...]
// Get the best match only
const best = detectBest('你好世界');
console.log(best);
// { encoding: 'UTF-8', confidence: 0.90 }
// Get just the encoding name
const encoding = detectEncoding('你好世界');
console.log(encoding);
// 'UTF-8'Working with Binary Data
import { detect } from 'zh-chardet';
// From Uint8Array
const bytes = new Uint8Array([0xC4, 0xE3, 0xBA, 0xC3]);
const results = detect(bytes);
// From Buffer (Node.js)
const buffer = Buffer.from([0xC4, 0xE3, 0xBA, 0xC3]);
const results2 = detect(buffer);Options
import { detect } from 'zh-chardet';
// Filter results by minimum confidence
const results = detect(text, { minimumConfidence: 0.5 });API
detect(input, options?): DetectionResult[]
Returns all possible encoding matches with confidence scores.
input:string | Buffer | Uint8Array- Text or binary data to analyzeoptions.minimumConfidence:number- Filter results below this confidence (default: 0.1)
detectBest(input, options?): DetectionResult | null
Returns the highest confidence encoding match.
detectEncoding(input, options?): Encoding | null
Returns just the encoding name of the best match.
Supported Encodings
- UTF-8: Unicode encoding
- UTF-16LE: UTF-16 Little Endian (with/without BOM)
- UTF-16BE: UTF-16 Big Endian (with/without BOM)
- GB18030: Chinese national standard encoding
- Big5: Traditional Chinese encoding
License
MIT