Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

html-dom-parser

remarkablemark4.7mMIT5.0.13TypeScript support: included

HTML to DOM parser.

html-dom-parser, html, dom, parser, htmlparser2, pojo

readme

html-dom-parser

NPM

NPM version Bundlephobia minified + gzip Build Status codecov NPM downloads

HTML to DOM parser that works on both the server (Node.js) and the client (browser):

HTMLDOMParser(string[, options])

The parser converts an HTML string to a JavaScript object that describes the DOM tree.

Example

import parse from 'html-dom-parser';

parse('<p>Hello, World!</p>');
<summary>Output</summary>

[
  Element {
    type: 'tag',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    children: [
      Text {
        type: 'text',
        parent: [Circular],
        prev: null,
        next: null,
        startIndex: null,
        endIndex: null,
        data: 'Hello, World!'
      }
    ],
    name: 'p',
    attribs: {}
  }
]

Replit | JSFiddle | Examples

Install

NPM:

npm install html-dom-parser --save

Yarn:

yarn add html-dom-parser

CDN:

<script src="https://unpkg.com/html-dom-parser@latest/dist/html-dom-parser.min.js"></script>
<script>
  window.HTMLDOMParser(/* string */);
</script>

Usage

Import with ES Modules:

import parse from 'html-dom-parser';

Require with CommonJS:

const parse = require('html-dom-parser').default;

Parse empty string:

parse('');

Output:

[]

Parse string:

parse('Hello, World!');
<summary>Output</summary>

[
  Text {
    type: 'text',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    data: 'Hello, World!'
  }
]

Parse element with attributes:

parse('<p class="foo" style="color: #bada55">Hello, <em>world</em>!</p>');
<summary>Output</summary>

[
  Element {
    type: 'tag',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    children: [ [Text], [Element], [Text] ],
    name: 'p',
    attribs: { class: 'foo', style: 'color: #bada55' }
  }
]

The server parser is a wrapper of htmlparser2 parseDOM but with the root parent node excluded. The next section shows the available options you can use with the server parse.

The client parser mimics the server parser by using the DOM API to parse the HTML string.

Options (server only)

Because the server parser is a wrapper of htmlparser2, which implements domhandler, you can alter how the server parser parses your code with the following options:

/**
 * These are the default options being used if you omit the optional options object.
 * htmlparser2 will use the same options object for its domhandler so the options
 * should be combined into a single object like so:
 */
const options = {
  /**
   * Options for the domhandler class.
   * https://github.com/fb55/domhandler/blob/master/src/index.ts#L16
   */
  withStartIndices: false,
  withEndIndices: false,
  xmlMode: false,
  /**
   * Options for the htmlparser2 class.
   * https://github.com/fb55/htmlparser2/blob/master/src/Parser.ts#L104
   */
  xmlMode: false, // Will overwrite what is used for the domhandler, otherwise inherited.
  decodeEntities: true,
  lowerCaseTags: true, // !xmlMode by default
  lowerCaseAttributeNames: true, // !xmlMode by default
  recognizeCDATA: false, // xmlMode by default
  recognizeSelfClosing: false, // xmlMode by default
  Tokenizer: Tokenizer,
};

If you're parsing SVG, you can set lowerCaseTags to true without having to enable xmlMode. This will return all tag names in camelCase and not the HTML standard of lowercase.

[!NOTE] If you're parsing code client-side (in-browser), you cannot control the parsing options. Client-side parsing automatically handles returning some HTML tags in camelCase, such as specific SVG elements, but returns all other tags lowercased according to the HTML standard.

Migration

v5

Migrated to TypeScript. CommonJS imports require the .default key:

const parse = require('html-dom-parser').default;

v4

Upgraded htmlparser2 to v9.

v3

Upgraded domhandler to v5. Parser options like normalizeWhitespace have been removed.

v2

Removed Internet Explorer (IE11) support.

v1

Upgraded domhandler to v4 and htmlparser2 to v6.

Release

Release and publish are automated by Release Please.

Special Thanks

License

MIT

changelog

Changelog

All notable changes to this project will be documented in this file. See standard-version for commit guidelines.

5.0.13 (2024-12-25)

Build System

  • deps: bump htmlparser2 from 9.1.0 to 10.0.0 (#929) (2d15abe)

5.0.12 (2024-12-16)

Bug Fixes

  • client: don't break LaTeX when replacing carriage returns (d69bc66), closes #917

5.0.11 (2024-12-04)

Bug Fixes

  • enable client parser to retain carriage return characters (#902) (fe2e993), closes #420

5.0.10 (2024-08-28)

Continuous Integration

  • github: publish package to npm registry with provenance (e023fe8)

5.0.9 (2024-07-18)

Bug Fixes

  • exports field includes package.json (c373a92)

5.0.8 (2024-02-12)

Bug Fixes

  • esm: fix exported types (b6918ae)

5.0.7 (2024-01-13)

Build System

  • deps: bump htmlparser2 from 9.0.0 to 9.1.0 (#631) (6816800)

5.0.6 (2023-12-19)

Bug Fixes

  • re-export types correctly for verbatimModuleSyntax (#612) (782b675)

5.0.5 (2023-12-16)

Bug Fixes

  • esm: fix ESM types by adding .mts declaration files (96a1cfc)

5.0.4 (2023-10-31)

Bug Fixes

  • esm: support vite bundler (c9e510f)

5.0.3 (2023-10-22)

Miscellaneous Chores

  • export types from index.ts (8ed55e2)

5.0.2 (2023-10-19)

Bug Fixes

  • package: add "/src" to files to fix source map warning (7082c50)

5.0.1 (2023-10-17)

Bug Fixes

  • package: add types to exports in package.json (df08df3)

5.0.0 (2023-10-16)

⚠ BREAKING CHANGES

  • CommonJS imports require the .default key.

Code Refactoring

4.0.1 (2023-10-15)

Miscellaneous Chores

  • index: set TypeScript Version to 5.2 in index.d.ts (#525) (8219338)

4.0.0 (2023-05-31)

⚠ BREAKING CHANGES

  • deps: bump htmlparser2 from 8.0.2 to 9.0.0

Build System

  • deps: bump htmlparser2 from 8.0.2 to 9.0.0 (467bbaa), closes #459

3.1.7 (2023-03-25)

Build System

  • deps: bump htmlparser2 from 8.0.1 to 8.0.2 (4fbe117), closes #433

3.1.6 (2023-03-22)

Bug Fixes

  • client: correct spelling of feGaussianBlur (9e28250), closes #429

3.1.5 (2023-03-06)

Bug Fixes

  • client: check for "template" in utilities formatDOM (748cf27), closes #417

3.1.4 (2023-03-04)

Bug Fixes

  • client: get template content childNodes in utilities formatDOM (c2c0bed), closes #414

3.1.3 (2023-01-17)

Bug Fixes

  • package: specify types in package.json and exports field (21fb028)

3.1.2 (2022-08-23)

Bug Fixes

  • client: fix import in html-to-dom.mjs (78a7607), closes #337

3.1.1 (2022-08-20)

Bug Fixes

  • client: correct ECMAScript export in client html-to-dom.mjs (7de506c), closes #334

3.1.0 (2022-08-16)

Features

3.0.1 (2022-07-10)

Bug Fixes

  • client: ensure head and body with newline are parsed correctly (b26b645), closes #317

3.0.0 (2022-07-05)

⚠ BREAKING CHANGES

  • htmlparser2 7.2.0 → 8.0.1

Build System

  • upgrade domhandler to 5.0.3 and htmlparser2 to 8.0.1 (e80a69c)

2.0.0 (2022-06-18)

⚠ BREAKING CHANGES

  • client: remove Internet Explorer (IE11) support

Features

  • client: remove Internet Explorer (IE11) support (b34cbe1), closes #225

1.2.0 (2022-04-14)

Features

  • add compatibility for react-native (4a4a974)

1.1.1 (2022-03-20)

Build System

  • package: upgrade domhandler from 4.3.0 to 4.3.1 (c2e8a82)

1.1.0 (2022-02-05)

Features

1.0.4 (2021-12-06)

Build System

  • deps: bump domhandler from 4.2.2 to 4.3.0 (cb49258)

1.0.3 (2021-11-27)

Performance Improvements

  • upgrade dependency htmlparser2 to v7.2.0 (7819211)

1.0.2 (2021-09-06)

Build System

  • deps: bump domhandler from 4.2.0 to 4.2.2 (ab46792)

1.0.1 (2021-06-13)

1.0.0 (2020-12-25)

Build System

  • package: upgrade domhandler to v4 and htmlparser2 to v6 (ec5673e)

Performance Improvements

  • client: deprecate Internet Explorer 9 (IE9) (d42ea4e)
  • utilities: continue if nodeType is not element, text, comment (793ff0c)

BREAKING CHANGES

  • package: upgrade domhandler to v4 and htmlparser2 to v6

domhandler 3.3.0 → 4.0.0 htmlparser2 4.1.0 → 6.0.0

domhandler:

htmlparser2:

decodeEntities option now defaults to true. <title> is parsed correctly. Remove root parent node to keep parser backwards compatible.

0.5.0 (2020-12-13)

Features

  • upgrade domhandler to 3.3.0 and htmlparser2 to 4.1.0 (2a748b8)

0.4.0 (2020-12-13)

Features

  • upgrade domhandler to 3.0.0 and htmlparser to 4.0.0 (44dba5e)

0.3.1 (2020-12-13)

0.3.0 (2020-06-02)

Features

  • lib: throw error if browser does not support parsing methods (de327af)

Performance Improvements

  • lib: return [] if empty string is passed to server parser (9850d05)

0.2.3 (2019-11-04)

Bug Fixes

  • lib: improve head and body regex in domparser.js (457bb58), closes #18

Build System

  • package: save commitlint, husky, and lint-staged to devDeps (3b0ce91)
  • package: update eslint and install prettier and plugin (b7a6b81)
  • package: update webpack and save webpack-cli (908e56d)
  • package: update dependencies and devDependencies (a9016be)

Tests

  • server: remove skipped test (a4c1057)
  • refactor tests to ES6 (d5255a5)
  • cases: add empty string test case to html.js (25d7e8a)
  • cases: add more special test cases to html.js (6fdf2ea)
  • cases: refactor test cases and move html data to its own file (e4fcb09)
  • cases: remove unnecessary try/catch wrapper to fix lint error (ca8175e)
  • cases: skip html test cases that PhantomJS does not support (d095d29)
  • cases: update complex.html (1418775)
  • client: add tests for client parser that will be run by karma (a0c58aa)
  • helpers: create index.js which exports helpers (a9255d5)
  • helpers: move helper that tests for errors to separate file (f2e6312)
  • helpers: refactor and move runTests to its own file (8e30784)
  • server: add tests that spy and mock htmlparser2 and domhandler (61075a1)
  • server: move html-to-dom-server.js to server directory (3684dac)

0.2.2 (2019-06-07)

Bug Fixes

  • utilities: do not lowercase case-sensitive SVG tags (4083004)

Performance Improvements

  • utilities: optimize case-sensitive tag replace with hash map (6aa06ee)

0.2.1 (2019-04-03)

0.2.0 (2019-04-01)

Features

  • types: add TypeScript decelerations (b52d52f)

0.1.3 - 2018-02-20

Fixed

  • Fix regular expression vulnerability (#8)
    • Regex has potential for catastrophic backtracking
    • Credit goes to @davisjam for discovering it

Changed

  • Refactored and updated tests (#8)

0.1.2 - 2017-09-30

Added

  • Create helper isIE() in utilities (#7)

Fixed

  • Fix client parser in IE/IE9 (#6, #7)

Changed

0.1.1 - 2017-06-26

Added

  • CHANGELOG with previous releases backfilled

Fixed

  • Fix client parser on IE by specifying required parameter for createHTMLDocument (#4)

0.1.0 - 2017-06-17

Changed

  • Improve, refactor, and optimize client parser
    • Use template, DOMImplementation, and/or DOMParser

0.0.2 - 2016-10-10

Added

  • Create npm scripts for prepublish

Changed

  • Change webpack to build to UMD target
  • Update README installation and usage instructions

0.0.1 - 2016-10-10

Added

  • Server parser
    • Wrapper for htmlparser2.parseDOM
  • Client parser
    • Uses DOM API to mimic server parser output
    • Build client library with webpack
  • Add README, tests, and other necessary files