Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

zh-chardet

n-pn20MIT1.0.0TypeScript support: included

Chinese character encoding detection for GB18030, Big5, UTF-8, UTF-16

encoding, detection, chinese, gb18030, big5, utf8, utf16, chardet

readme

zh-chardet

Chinese character encoding detection for GB18030, Big5, UTF-8, and UTF-16.

Features

  • Detects GB18030, Big5, UTF-8, UTF-16LE, and UTF-16BE encodings
  • Works in both Node.js and browsers
  • TypeScript support included
  • Confidence scoring for detection results
  • Multiple detection result options

Installation

npm install zh-chardet

Usage

Basic Usage

import { detect, detectBest, detectEncoding } from 'zh-chardet';

// Get all possible encodings with confidence scores
const results = detect('你好世界');
console.log(results);
// [{ encoding: 'UTF-8', confidence: 0.90 }, ...]

// Get the best match only
const best = detectBest('你好世界');
console.log(best);
// { encoding: 'UTF-8', confidence: 0.90 }

// Get just the encoding name
const encoding = detectEncoding('你好世界');
console.log(encoding);
// 'UTF-8'

Working with Binary Data

import { detect } from 'zh-chardet';

// From Uint8Array
const bytes = new Uint8Array([0xC4, 0xE3, 0xBA, 0xC3]);
const results = detect(bytes);

// From Buffer (Node.js)
const buffer = Buffer.from([0xC4, 0xE3, 0xBA, 0xC3]);
const results2 = detect(buffer);

Options

import { detect } from 'zh-chardet';

// Filter results by minimum confidence
const results = detect(text, { minimumConfidence: 0.5 });

API

detect(input, options?): DetectionResult[]

Returns all possible encoding matches with confidence scores.

  • input: string | Buffer | Uint8Array - Text or binary data to analyze
  • options.minimumConfidence: number - Filter results below this confidence (default: 0.1)

detectBest(input, options?): DetectionResult | null

Returns the highest confidence encoding match.

detectEncoding(input, options?): Encoding | null

Returns just the encoding name of the best match.

Supported Encodings

  • UTF-8: Unicode encoding
  • UTF-16LE: UTF-16 Little Endian (with/without BOM)
  • UTF-16BE: UTF-16 Big Endian (with/without BOM)
  • GB18030: Chinese national standard encoding
  • Big5: Traditional Chinese encoding

License

MIT