parse-entities
Parse HTML character references.
Contents
- What is this?
- When should I use this?
- Install
- Use
- API
- Types
- Compatibility
- Security
- Related
- Contribute
- License
What is this?
This is a small and powerful decoder of HTML character references (often called entities).
When should I use this?
You can use this for spec-compliant decoding of character references. It’s small and fast enough to do that well. You can also use this when making a linter, because there are different warnings emitted with reasons for why and positional info on where they happened.
Install
This package is ESM only. In Node.js (version 14.14+, 16.0+), install with npm:
npm install parse-entitiesIn Deno with esm.sh:
import {parseEntities} from 'https://esm.sh/parse-entities@3'In browsers with esm.sh:
<script type="module">
  import {parseEntities} from 'https://esm.sh/parse-entities@3?bundle'
</script>Use
import {parseEntities} from 'parse-entities'
console.log(parseEntities('alpha & bravo')))
// => alpha & bravo
console.log(parseEntities('charlie ©cat; delta'))
// => charlie ©cat; delta
console.log(parseEntities('echo © foxtrot ≠ golf 𝌆 hotel'))
// => echo © foxtrot ≠ golf 𝌆 hotelAPI
This package exports the identifier parseEntities.
There is no default export.
parseEntities(value[, options])
Parse HTML character references.
options
Configuration (optional).
options.additional
Additional character to accept (string?, default: '').
This allows other characters, without error, when following an ampersand.
options.attribute
Whether to parse value as an attribute value (boolean?, default: false).
This results in slightly different behavior.
options.nonTerminated
Whether to allow nonterminated references (boolean, default: true).
For example, ©cat for ©cat.
This behavior is compliant to the spec but can lead to unexpected results.
options.position
Starting position of value (Position or Point, optional).
Useful when dealing with values nested in some sort of syntax tree.
The default is:
{line: 1, column: 1, offset: 0}options.warning
Error handler (Function?).
options.text
Text handler (Function?).
options.reference
Reference handler (Function?).
options.warningContext
Context used when calling warning ('*', optional).
options.textContext
Context used when calling text ('*', optional).
options.referenceContext
Context used when calling reference ('*', optional)
Returns
string — decoded value.
function warning(reason, point, code)
Error handler.
Parameters
- this(- *) — refers to- warningContextwhen given to- parseEntities
- reason(- string) — human readable reason for emitting a parse error
- point(- Point) — place where the error occurred
- code(- number) — machine readable code the error
The following codes are used:
| Code | Example | Note | 
|---|---|---|
| 1 | foo & bar | Missing semicolon (named) | 
| 2 | foo { bar | Missing semicolon (numeric) | 
| 3 | Foo &bar baz | Empty (named) | 
| 4 | Foo &# | Empty (numeric) | 
| 5 | Foo &bar; baz | Unknown (named) | 
| 6 | Foo € baz | Disallowed reference | 
| 7 | Foo � baz | Prohibited: outside permissible unicode range | 
function text(value, position)
Text handler.
Parameters
- this(- *) — refers to- textContextwhen given to- parseEntities
- value(- string) — string of content
- position(- Position) — place where- valuestarts and ends
function reference(value, position, source)
Character reference handler.
Parameters
- this(- *) — refers to- referenceContextwhen given to- parseEntities
- value(- string) — decoded character reference
- position(- Position) — place where- sourcestarts and ends
- source(- string) — raw source of character reference
Types
This package is fully typed with TypeScript.
It exports the additional types Options, WarningHandler,
ReferenceHandler, and TextHandler.
Compatibility
This package is at least compatible with all maintained versions of Node.js. As of now, that is Node.js 14.14+ and 16.0+. It also works in Deno and modern browsers.
Security
This package is safe: it matches the HTML spec to parse character references.
Related
- wooorm/stringify-entities— encode HTML character references
- wooorm/character-entities— info on character references
- wooorm/character-entities-html4— info on HTML4 character references
- wooorm/character-entities-legacy— info on legacy character references
- wooorm/character-reference-invalid— info on invalid numeric character references
Contribute
Yes please! See How to Contribute to Open Source.
 wooorm
wooorm