Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

multibyte

kensnyder134ISC1.0.4TypeScript support: included

Unicode-aware string functions

multibyte, unicode, utf8, utf-8, utf-16, utf16, multibyte safe, multibyte-safe, unicode aware, unicode-aware, emoji, emoji-safe

readme

multibyte

NPM Link Language Build Status Code Coverage Gzipped Size Dependency details Tree shakeable ISC License

multibyte provides common string functions that respect multibyte Unicode characters.

npm install multibyte

The problem and the solution

On one hand, JavaScript strings use UTF-16 encoding, and on the other hand, JavaScript strings behave like an Array of code points. Unicode characters that take more than 2 bytes (like newer emoji) get split into 2 code points in many situations.

If you display Unicode text from a UTF-8 source, you need these multibyte functions that take advantage of the fact that Array.from(string) is Unicode safe.

import {
  charAt,
  codePointAt,
  length,
  slice,
  split,
  truncateBytes,
} from 'multibyte';

// JavaScript String.prototype.charAt() can return a UTF-16 surrogate
'a🚀c'.charAt(1); //  ❌ "\ud83d" (half a rocket)
charAt('a🚀c', 1); // ✅ "🚀"

// JavaScript String.prototype.codePointAt() can return a UTF-16 surrogate
'🚀abc'.codePointAt(1); //  ❌ 56960 (surrogate pair of rocket emoji)
codePointAt('🚀abc', 1); // ✅ 97 (the letter a)

// JavaScript returns length in UTF-16, not Unicode characters
'a🚀c'.length; //  ❌ 4
length('a🚀c'); // ✅ 3

// JavaScript slices along UTF-16 boundaries, not Unicode characters
'a🚀cdef'.slice(2, 3); //  ❌ "\ude80" (half a rocket)
slice('a🚀cdef', 2, 3); // ✅ "c"

// JavaScript splits along UTF-16 boundaries, not Unicode characters
'a🚀c'.split(''); //  ❌ ["a", "\ud83d", "\ude80", "c"]
split('a🚀c', ''); // ✅ ["a", "🚀", "c"] ✅

// JavaScript slices strings along UTF-16 boundaries, not Unicode characters
'a🚀cdef'.slice(0, 2); //       ❌ "a\ud83d" (half a rocket)
truncateBytes('a🚀cdef', 2); // ✅ "a" (including the rocket would be 3 total bytes)

BOM (Byte order mark) - U+FEFF

Under the hood, all these functions strip a leading BOM if present.

changelog

Changelog

1.0.1 - Jul 13, 2024

  • Fix readme example

1.0.0 - Jul 13, 2024

  • Update dependencies (no breaking changes)
  • Initial release

1.0.0-beta - Feb 15, 2023

  • Initial beta release