Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

htmljs-parser

marko-js17.8kMIT5.5.3TypeScript support: included

An HTML parser recognizes content and string placeholders and allows JavaScript expressions as attribute values

HTML, JavaScript, browser, compiler, expressions, nodejs, parser, server, template

readme


htmljs-parser
Styled with prettier Build status Code Coverage NPM version Downloads

An HTML parser with super powers used by Marko.

Installation

npm install htmljs-parser

Creating A Parser

First we must create a parser instance and pass it some handlers for the various parse events shown below.

Each parse event is called a Range and is an object with start and end properties which are zero-based offsets from the beginning of th parsed code.

Additional meta data and nested ranges are exposed on some events shown below.

You can get the raw string from any range using parser.read(range).

import { createParser, ErrorCode, TagType } from "htmljs-parser";

const parser = createParser({
  /**
   * Called when the parser encounters an error.
   *
   * @example
   * 1╭─ <a><b
   *  ╰─     ╰─ error(code: 19, message: "EOF reached while parsing open tag")
   */
  onError(range) {
    range.code; // An error code id. You can see the list of error codes in ErrorCode imported above.
    range.message; // A human readable (hopefully) error message.
  },

  /**
   * Called when some static text is parsed within some body content.
   *
   * @example
   * 1╭─ <div>Hi</div>
   *  ╰─      ╰─ text "Hi"
   */
  onText(range) {},

  /**
   * Called after parsing a placeholder within body content.
   *
   * @example
   * 1╭─ <div>${hello} $!{world}</div>
   *  │       │ │      │  ╰─ placeholder.value "world"
   *  │       │ │      ╰─ placeholder "$!{world}"
   *  │       │ ╰─ placeholder:escape.value "hello"
   *  ╰─      ╰─ placeholder:escape "${hello}"
   */
  onPlaceholder(range) {
    range.escape; // true for ${} placeholders and false for $!{} placeholders.
    range.value; // Another range that includes only the placeholder value itself without the wrapping braces.
  },

  /**
   * Called when we find a comment at the root of the document or within a tags contents.
   * It will not be fired for comments within expressions, such as attribute values.
   *
   * @example
   * 1╭─ <!-- hi -->
   *  │  │   ╰─ comment.value " hi "
   *  ╰─ ╰─ comment "<!-- hi -->"
   * 2╭─ // hi
   *  │  │ ╰─ comment.value " hi"
   *  ╰─ ╰─ comment "// hi"
   */
  onComment(range) {
    range.value; // Another range that only includes the contents of the comment.
  },

  /**
   * Called after parsing a CDATA section.
   * // https://developer.mozilla.org/en-US/docs/Web/API/CDATASection
   *
   * @example
   * 1╭─ <![CDATA[hi]]>
   *  │  │        ╰─ cdata.value "hi"
   *  ╰─ ╰─ cdata "<![CDATA[hi]]>"
   */
  onCDATA(range) {
    range.value; // Another range that only includes the contents of the CDATA.
  },

  /**
   * Called after parsing a DocType comment.
   * https://developer.mozilla.org/en-US/docs/Web/API/DocumentType
   *
   * @example
   * 1╭─ <!DOCTYPE html>
   *  │  │ ╰─ doctype.value "DOCTYPE html"
   *  ╰─ ╰─ doctype "<!DOCTYPE html>"
   */
  onDoctype(range) {
    range.value; // Another range that only includes the contents of the DocType.
  },

  /**
   * Called after parsing an XML declaration.
   * https://developer.mozilla.org/en-US/docs/Web/XML/XML_introduction#xml_declaration
   *
   * @example
   * 1╭─ <?xml version="1.0" encoding="UTF-8"?>
   *  │  │ ╰─ declaration.value "xml version=\"1.0\" encoding=\"UTF-8\""
   *  ╰─ ╰─ declaration "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
   */
  onDeclaration(range) {
    range.value; // Another range that only includes the contents of the declaration.
  },

  /**
   * Called after parsing a scriptlet (new line followed by a $).
   *
   * @example
   * 1╭─ $ foo();
   *  │   │╰─ scriptlet.value "foo();"
   *  ╰─  ╰─ scriptlet " foo();"
   * 2╭─ $ { bar(); }
   *  │   │ ╰─ scriptlet:block.value " bar(); "
   *  ╰─  ╰─ scriptlet:block " { bar(); }"
   */
  onScriptlet(range) {
    range.block; // true if the scriptlet was contained within braces.
    range.value; // Another range that includes only the value itself without the leading $ or surrounding braces (if applicable).
  },

  /**
   * Called when we're about to begin an HTML open tag (before the tag name).
   * Note: This is only called for HTML mode tags and can be used to track if you are in concise mode.
   *
   * @example
   * 1╭─ <div>Hi</div>
   *  ╰─ ╰─ openTagStart
   */
  onOpenTagStart(range) {},

  /**
   * Called when a tag name, which can include placeholders, has been parsed.
   *
   * @example
   * 1╭─ <div/>
   *  ╰─  ╰─ openTagName "div"
   * 2╭─ <hello-${test}-again/>
   *  │   │     │      ╰─ openTagName.quasis[1] "-again"
   *  │   │     ╰─ openTagName.expressions[0] "${test}"
   *  │   ├─ openTagName.quasis[0] "hello-"
   *  ╰─  ╰─ openTagName "hello-${test}-again"
   */
  onOpenTagName(range) {
    range.quasis; // An array of ranges that indicate the string literal parts of the tag name.
    range.expressions; // A list of placeholder ranges (similar to whats emitted via onPlaceholder).

    // Return a different tag type enum value to enter a different parse mode.
    // Below is approximately what Marko uses:
    switch (parser.read(range)) {
      case "area":
      case "base":
      case "br":
      case "col":
      case "embed":
      case "hr":
      case "img":
      case "input":
      case "link":
      case "meta":
      case "param":
      case "source":
      case "track":
      case "wbr":
        // TagType.void makes this a void element (cannot have children).
        return TagType.void;
      case "html-comment":
      case "script":
      case "style":
      case "textarea":
        // TagType.text makes the child content text only (with placeholders).
        return TagType.text;
      case "class":
      case "export":
      case "import":
      case "static":
        // TagType.statement makes this a statement tag where the content following the tag name will be parsed as script code until we reach a new line, eg for `import x from "y"`).
        return TagType.statement;
    }

    // TagType.html is the default which allows child content as html with placeholders.
    return TagType.html;
  },

  /**
   * Called when a shorthand id, which can include placeholders, has been parsed.
   *
   * @example
   * 1╭─ <div#hello-${test}-again/>
   *  │      ││     │       ╰─ tagShorthandId.quasis[1] "-again"
   *  │      ││     ╰─ tagShorthandId.expressions[0] "${test}"
   *  │      │╰─ tagShorthandId.quasis[0] "hello-"
   *  ╰─     ╰─ tagShorthandId "#hello-${test}-again"
   */
  onTagShorthandId(range) {
    range.quasis; // An array of ranges that indicate the string literal parts of the shorthand id name.
    range.expressions; // A list of placeholder ranges (similar to whats emitted via onPlaceholder).
  },

  /**
   * Called when a shorthand class name, which can include placeholders, has been parsed.
   * Note there can be multiple of these.
   *
   * @example
   * 1╭─ <div.hello-${test}-again/>
   *  │      ││     │       ╰─ tagShorthandClassName.quasis[1] "-again"
   *  │      ││     ╰─ tagShorthandClassName.expressions[0] "${test}"
   *  │      │╰─ tagShorthandClassName.quasis[0] "hello-"
   *  ╰─     ╰─ tagShorthandClassName "#hello-${test}-again"
   */
  onTagShorthandClass(range) {
    range.quasis; // An array of ranges that indicate the string literal parts of the shorthand id name.
    range.expressions; // A list of placeholder ranges (similar to whats emitted via onPlaceholder).
  },

  /**
   * Called after the type arguments for a tag have been parsed.
   *
   * @example
   * 1╭─ <foo<string>>
   *  │      │╰─ tagTypeArgs.value "string"
   *  ╰─     ╰─ tagTypeArgs "<string>"
   */
  onTagTypeArgs(range) {
    range.value; // Another range that includes only the type arguments themselves and not the angle brackets.
  },

  /**
   * Called after a tag variable has been parsed.
   *
   * @example
   * 1╭─ <div/el/>
   *  │      │╰─ tagVar.value "el"
   *  ╰─     ╰─ tagVar "/el"
   */
  onTagVar(range) {
    range.value; // Another range that includes only the tag var itself and not the leading slash.
  },

  /**
   * Called after tag arguments have been parsed.
   *
   * @example
   * 1╭─ <if(x)>
   *  │     │╰─ tagArgs.value "x"
   *  ╰─    ╰─ tagArgs "(x)"
   */
  onTagArgs(range) {
    range.value; // Another range that includes only the args themselves and not the outer parenthesis.
  },

  /**
   * Called after type parameters for the tag parameters have been parsed.
   *
   * @example
   * 1╭─ <tag<T>|input: { name: T }|>
   *  │      │╰─ tagTypeParams.value
   *  ╰─     ╰─ tagTypeParams "<T>"
   */
  onTagTypeParams(range) {
    range.value; // Another range that includes only the type params themselves and not the angle brackets.
  },

  /**
   * Called after tag parameters have been parsed.
   *
   * @example
   * 1╭─ <for|item| of=list>
   *  │      │╰─ tagParams.value "item"
   *  ╰─     ╰─ tagParams "|item|"
   */
  onTagParams(range) {
    range.value; // Another range that includes only the params themselves and not the outer pipes.
  },

  /**
   * Called after an attribute name as been parsed.
   * Note this may be followed by the related AttrArgs, AttrValue or AttrMethod. It can also be directly followed by another AttrName, AttrSpread or the OpenTagEnd if this is a boolean attribute.
   *
   * @example
   * 1╭─ <div class="hi">
   *  ╰─      ╰─ attrName "class"
   */
  onAttrName(range) {},

  /**
   * Called after attr arguments have been parsed.
   *
   * @example
   * 1╭─ <div if(x)>
   *  │         │╰─ attrArgs.value "x"
   *  ╰─        ╰─ attrArgs "(x)"
   */
  onAttrArgs(range) {
    range.value; // Another range that includes only the args themselves and not the outer parenthesis.
  },

  /**
   * Called after an attr value has been parsed.
   *
   * @example
   * 1╭─ <input name="hi" value:=x>
   *  │             ││         │ ╰─ attrValue:bound.value
   *  │             ││         ╰─ attrValue:bound ":=x"
   *  │             │╰─ attrValue.value "\"hi\""
   *  ╰─            ╰─ attrValue "=\"hi\""
   */
  onAttrValue(range) {
    range.bound; // true if the attribute value was preceded by :=.
    range.value; // Another range that includes only the value itself without the leading = or :=.
  },

  /**
   * Called after an attribute method shorthand has been parsed.
   *
   * @example
   * 1╭─ <div onClick(ev) { foo(); }>
   *  │              ││   │╰─ attrMethod.body.value " foo(); "
   *  │              ││   ╰─ attrMethod.body "{ foo(); }"
   *  │              │╰─ attrMethod.params.value "ev"
   *  │              ├─ attrMethod.params "(ev)"
   *  ╰─             ╰─ attrMethod "(ev) { foo(); }"
   */
  onAttrMethod(range) {
    range.typeParams; // Another range which includes the type params for the method.
    range.typeParams.value; // Another range which includes the type params without outer angle brackets.

    range.params; // Another range which includes the params for the method.
    range.params.value; // Another range which includes the method params without outer parenthesis.

    range.body; // Another range which includes the entire body block.
    range.body.value; // Another range which includes the body block without outer braces.
  },

  /**
   * Called after we've parsed a spread attribute.
   *
   * @example
   * 1╭─ <div ...attrs>
   *  │       │  ╰─ attrSpread.value "attrs"
   *  ╰─      ╰─ attrSpread "...attrs"
   */
  onAttrSpread(range) {
    range.value; // Another range that includes only the value itself without the leading ...
  },

  /**
   * Called once we've completed parsing the open tag.
   *
   * @example
   * 1╭─ <div><span/></div>
   *  │      │     ╰─ openTagEnd:selfClosed "/>"
   *  ╰─     ╰─ openTagEnd ">"
   */
  onOpenTagEnd(range) {
    range.selfClosed; // true if this tag was self closed (the onCloseTag* handlers will not be called if so).
  },

  /**
   * Called when we start parsing and html closing tag.
   * Note this is not emitted for concise, selfClosed, void or statement tags.
   *
   * @example
   * 1╭─ <div><span/></div>
   *  ╰─             ╰─ closeTagStart "</"
   */
  onCloseTagStart(range) {},

  /**
   * Called after the content within the brackets of an html closing tag has been parsed.
   * Note this is not emitted for concise, selfClosed, void or statement tags.
   *
   * @example
   * 1╭─ <div><span/></div>
   *  ╰─               ╰─ closeTagName "div"
   */
  onCloseTagName(range) {},

  /**
   * Called once the closing tag has finished parsing, or in concise mode we hit an outdent or eof.
   * Note this is not called for selfClosed, void or statement tags.
   *
   * @example
   * 1╭─ <div><span/></div>
   *  ╰─                  ╰─ closeTagEnd ">"
   */
  onCloseTagEnd(range) {},
});

Finally after setting up the parser with it's handlers, it's time to pass in some source code to parse.

parser.parse("<div></div>");

Parser Helpers

The parser instance provides a few helpers to make it easier to work with the parsed content.

// Pass any range object into this method to get the raw string from the source for the range.
parser.read(range);

// Given an zero based offset within the source code, returns a position object that contains line and column properties.
parser.positionAt(offset);

// Given a range object returns a location object with start and end properties which are each position objects as returned from the "positionAt" api.
parser.locationAt(range);

Code of Conduct

This project adheres to the eBay Code of Conduct. By participating in this project you agree to abide by its terms.

changelog

htmljs-parser

5.5.3

Patch Changes

5.5.2

Patch Changes

5.5.1

Patch Changes

  • #168 3a696d0 Thanks @DylanPiercey! - When the preceding character of an expression is a quote, prefer division over regexp state. This improves parsing for inline css grid properties.

5.5.0

Minor Changes

  • #164 13a33a3 Thanks @DylanPiercey! - Allow indented javascript style comments that are not under a parent tag in concise mode.

5.4.3

Patch Changes

  • #161 be73442 Thanks @DylanPiercey! - Fixes a regression where the parsed text state (used by eg script, style) was not properly entering back into text for the closing quote on the string.

  • #162 085451c Thanks @DylanPiercey! - Always consume next character of expression if terminator was preceded by an operator.

5.4.2

Patch Changes

  • #158 fe98530 Thanks @DylanPiercey! - Fixes an regression where string literals inside of parsed text nodes (eg <script>) were not properly changing the parser state. This caused issues when comment like syntax was embedded within these string literals"

5.4.1

Patch Changes

  • #156 72b3379 Thanks @DylanPiercey! - Fix regression where the parser would continue unary keyword expressions even if the keyword was inside a word boundary. Eg <div class=thing_new x> would cause the parser to see the expression as thing_ and new x.

5.4.0

Minor Changes

Patch Changes

  • #154 61e6966 Thanks @DylanPiercey! - Avoid continuing expressions after a period if after the whitespace is something that could not be an identifier.

5.3.0

Minor Changes

  • #152 ea65c9f Thanks @DylanPiercey! - Improve handling ambiguity with tag type args vs type params. Type args must now always be directly adjacent the tag name, otherwise it will become type params.

5.2.4

Patch Changes

5.2.3

Patch Changes

5.2.2

Patch Changes

  • #146 bcfd809 Thanks @DylanPiercey! - Fixes an issue where attribute names that started with a keyword (eg: as-thing or instanceof-thing) were incorrectly treated as an expression continuation.

5.2.1

Patch Changes

  • #143 635b97c Thanks @DylanPiercey! - Fix issue where and extra character was being consumed if an escaped placeholder was at the end of a tag.

5.2.0

Minor Changes

  • #141 81cff30 Thanks @DylanPiercey! - Add support for type parameter/argument parsing. This adds a new onTagTypeParams, onTagTypeArgs events and a .typeParams property on the AttrMethod range.

5.1.5

Patch Changes

5.1.4

Patch Changes

  • #136 b5fa4d0 Thanks @DylanPiercey! - Optimize parser constructor to avoid initializing unecessary properties.

  • #134 cdbc6b2 Thanks @DylanPiercey! - Improve missing attribute error when the tag is immediately closed without the attribute value.

5.1.3

Patch Changes

  • #132 59a10d5 Thanks @DylanPiercey! - Fix regression which caused script tags with a trailing comment as the same line as the closing tag to not always parse properly.

  • #132 59a10d5 Thanks @DylanPiercey! - Remove unecessary check for cdata inside parsed text state.

5.1.2

Patch Changes

  • #130 ebc850f Thanks @DylanPiercey! - Switch from regexp based parsing for the expression continuations. This slightly improves performance and more importantly fixes usage of the parser in safari.

5.1.1

Patch Changes

  • #127 222b145 Thanks @DylanPiercey! - Fix regression around JS style comments in the body by requiring that they are preceded by a whitespace.

5.1.0

Minor Changes

5.0.4

Patch Changes

5.0.3

Patch Changes

5.0.2

Patch Changes

  • #119 28fde07 Thanks @DylanPiercey! - Support JS line comments inside the open tag (previously just block comments could be used).

  • #119 28fde07 Thanks @DylanPiercey! - Support JS style comments in HTML bodies (previously allowed in parsed text and concise mode).

5.0.1

Patch Changes

  • #117 8bd3c40 Thanks @DylanPiercey! - Fix issue with onCloseTagStart not called for text mode tags (eg style, script, textarea & html-comment).

5.0.0

Major Changes

  • #114 14f3499 Thanks @DylanPiercey! - Rename onTagName to onOpenTagName. Add a new onOpenTagStart event (before onOpenTagName). Split the onCloseTag event into three new events: onClosetTagStart, onCloseTagName & onCloseTagEnd).

4.0.0

Major Changes

  • #112 2ad4628 Thanks @DylanPiercey! - Switch character position offsets for newlines to be to similar to vscode. Previously the newline was counted as the first character of the line, now it is the last character of the previous line.

3.3.6

Patch Changes

3.3.5

Patch Changes

  • #108 8a988f4 Thanks @DylanPiercey! - Fix issue where parser would sometimes not consume enough characters and cause a bracket mismatch

3.3.4

Patch Changes

  • #103 a4e3635 Thanks @DylanPiercey! - Fixes issue where expressions could consume an extra character when windows line endings used.

  • #103 1f2c9b0 Thanks @DylanPiercey! - When parsing unenclosed expressions we look backwards for unary operators preceded by a word break. This caused a false positive when a member expression was found with the operator name, eg input.new. Now we ensure that these operators are not in a member expression like this.

  • #103 469b4bc Thanks @DylanPiercey! - Improves consistency with v2 of the parser by allowing expressions to span multiple lines if the line is ended with the continuation. This change also allows html attributes and grouped concise attributes to span multiple lines with a new line before or after the continuation.

3.3.3

Patch Changes

3.3.2

Patch Changes

  • #99 b1a3008 Thanks @DylanPiercey! - Fix expression continuations containing equals not consuming enough characters

3.3.1

Patch Changes