Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

codsen-tokenizer

codsen669MIT7.0.25TypeScript support: included

HTML and CSS lexer aimed at code with fatal errors, accepts mixed coding languages

ast, codsen, css, html, lexer, lint, parse, parsing, processing, token, tokeniser, tokenizer

readme

codsen-tokenizer

HTML and CSS lexer aimed at code with fatal errors, accepts mixed coding languages

page on codsen.com page on npm page on github Downloads per month changelog MIT Licence

Install

This package is pure ESM. If you're not ready yet, install an older version of this program, 5.6.0 (npm i codsen-tokenizer@5.6.0).

npm i codsen-tokenizer

Quick Take

import { strict as assert } from "assert";

import { tokenizer } from "codsen-tokenizer";

const gathered = [];

// it operates from a callback, like Array.prototype.forEach()
tokenizer("<td nowrap>", {
  tagCb: (obj) => {
    gathered.push(obj);
  },
});

assert.deepEqual(gathered, [
  {
    type: "tag",
    start: 0,
    end: 11,
    value: "<td nowrap>",
    tagNameStartsAt: 1,
    tagNameEndsAt: 3,
    tagName: "td",
    recognised: true,
    closing: false,
    void: false,
    pureHTML: true,
    kind: null,
    attribs: [
      {
        attribName: "nowrap",
        attribNameRecognised: true,
        attribNameStartsAt: 4,
        attribNameEndsAt: 10,
        attribOpeningQuoteAt: null,
        attribClosingQuoteAt: null,
        attribValueRaw: null,
        attribValue: [],
        attribValueStartsAt: null,
        attribValueEndsAt: null,
        attribStarts: 4,
        attribEnds: 10,
        attribLeft: 2,
      },
    ],
  },
]);

Documentation

Please visit codsen.com for a full description of the API.

Contributing

To report bugs or request features or assistance, raise an issue on GitHub.

Licence

MIT License.

Copyright © 2010-2024 Roy Revelt and other contributors.

ok codsen star

changelog

Change Log

All notable changes to this project will be documented in this file. See Conventional Commits for commit guidelines.

7.0.24 (2024-03-30)

Bug Fixes

  • remove an unused import (f97b2d8)

7.0.0 (2022-12-01)

BREAKING CHANGES

  • Minimum supported Node version is v14.18; we're dropping v12 support

6.1.0 (2022-08-12)

Features

6.0.17 (2022-04-18)

Fixed

6.0.0 (2021-09-09)

Features

BREAKING CHANGES

  • programs now are in ES Modules and won't work with Common JS require()

5.6.0 (2021-05-24)

Fixed

  • algorithm fixes (b20147c)
  • algorithm improvements (f313aed)
  • algorithm improvements around inline CSS styles and ESP tokens (774e925)
  • backwards pattern in inline CSS style rule values, text - ESP token (f2b6fa9)
  • chain pattern inside the inline CSS rule value (e83d482)
  • eSP tokens as inline style rule values - 3 tests pending (3f4aa0c)
  • fix pattern ESP token + text token within inline CSS style rule values (2396378)
  • fix to prevent double ESP token recorded (1f62f98)
  • patch text tokens only when they are really text tokens, not ESP tokens (6a56dd4)
  • some insurance for the future, to tackle broken ESP tokens (7f26a75)
  • space-!important (84e624f)
  • support for !important (9212ffe)
  • tackle whitespace in front of !important (a3f77e4)

Features

  • config file based major bump blacklisting (e15f9bb)
  • logic improvements around inline CSS and ESP tokens (4334454)

5.5.6 (2021-04-11)

Fixed

  • correctly end the inline CSS style property without a value (59b699d)

Reverts

  • Revert "chore: setup refresh" (23cf206)

5.5.0 (2021-03-23)

Fixed

  • recognise Nunjucks double curly variables within CSS rules (5963e7b)
  • recognise quote groups within HTML inline style attributes (1a79454)

Features

  • better recognition of closing tags with slash but missing opening bracket (fbce088)

5.4.0 (2021-03-14)

Fixed

  • further improvements to JS code recognition (77273f0)
  • tackle a case where attribute's opening quotes are followed by slash + bracket (876812e)

Features

  • improved JS code recognition (623f13e)

5.3.0 (2021-03-07)

Fixed

  • detect mis-typed !important better (a2a2631)
  • improve the tag end patching up when it abruptly ends (ff571fc)

Features

  • add detection for pattern: standalone space-semi in head CSS and HTML inline CSS properties (dc14191)
  • algorithm improvements for broken code recognition in CSS rules (3e0db8c)
  • improvements to broken CSS properties recognition (c8ef8e3)
  • recognise truncated CSS better (2d82a42)
  • repeated semi recognition in the inline/head CSS styles (4e98dbd)
  • rogue standalone semicolon recognition in the inline HTML styles (8317b28)

5.2.0 (2021-02-27)

Fixed

  • algorithm improvements in broken !important recognition (dcfd755)
  • algorithm improvements in broken !important recognition (0254ca8)
  • further improvements to broken code recognition (ba41245)
  • improvements to malformed !important recognition (7c9d70b)
  • stray !important to be put under important key, not under property (3cd6291)
  • tweak to address broken or partial code cases, around CSS rules (aea4d9b)

Features

  • algorithm improvements (b52667d)
  • correctly set whitespace after abruptly ended css rule as a text token (4fdd70d)
  • improve recognition of rogue characters in CSS rules (b57335a)
  • improvements to the CSS rule recognition, especially around !important (d89a7e0)
  • patch CSS rules when closing curly has not been met yet a new one starts (4f108a6)
  • tokenize !important in CSS (a6e0925)

5.1.0 (2021-02-07)

Features

  • improvements to ERB template recognition (5fd2ba1)

5.0.1 (2021-01-28)

Fixed

  • add testStats to npmignore (f3c84e9)

5.0.0 (2021-01-23)

Features

  • rewrite in TS, start using named exports (b41c644)

BREAKING CHANGES

  • previously you'd import a default: import tokenizer from ... - now, tap a named export: import { tokenizer } from ...

4.5.0 (2020-12-13)

Fixed

  • add checks and prevent throwing in certain unfinished code cases (aa63861)
  • fix a case of unfinished css style blocks (755ce98), closes #2

Features

  • improvements to abruptly ended chunk recognition (1728753)

4.4.0 (2020-12-11)

Features

  • improve the recognition of equal chars in uri's (8d041b7)
  • recognise attributes with certain curly quotes (4e43cc7)

4.3.0 (2020-12-06)

Features

  • add another lump blacklist pattern (6b3b87d)

4.2.0 (2020-12-04)

Features

  • recognise JSP (Java Server Pages) (68fa3c2)

4.1.0 (2020-12-03)

Features

  • add concept of lefty and righty characters to improve recognition (dd4b8cb)
  • improve the templating tag detection, exclude double parentheses better (724e827)

4.0.0 (2020-11-28)

Accidental version bump during migration to SourceHut. Sorry about that.

3.2.0 (2020-11-02)

Features

  • improve head CSS parent rule selector recognition (d9373d8)

3.1.0 (2020-10-19)

Features

  • improve inline css and head css recognition (e7a288c)

3.0.0 (2020-10-12)

Features

  • better inline css recognition (b1b67b3)
  • better recognition of rogue characters around inline css rules (3243b5c)
  • better regognition of a sequence of inline styles (de9b327)
  • css comments in head styles (a217543)
  • dRY the property pushing, add more tests (e6ed023)
  • head CSS style properties (9128dcb)
  • html inline css style comments (b2379e4)
  • improvements to missing semicol between two properties (cdb3423)
  • new html tag kind - inline (a5f8b94)
  • recognise a missing semicol between two properties in css (a8ef72a)
  • recognise double-wrapped attribute values (08e8dc6)
  • recognise erroneous inline css comments with slash-slash (061e84a)
  • recognise rogue characters within inline css styles (252eb07)
  • recognise rogue extra closing curlies in CSS rules (9bb1e7a)

BREAKING CHANGES

  • the API surface anywhere within CSS styles, both inline and in head <style>, have been improved

2.17.0 (2020-05-24)

Fixed

  • fix "rule" type node "left" key when it is preceded by at-rule (96e3a65)

Features

  • add attribs[].attribLeft to HTML tokens (44046c1)
  • add token.left key to rule-type tokens (93564de)
  • proper "at" and "rule" token nesting (bd5db56)

2.16.0 (2020-05-17)

Features

  • broken ESP tag recognition improvements (f3741e8)
  • improve ESP tag recognition, add more tests and make existing ones more precise (31e923c)
  • improvements to esp tag recognition (13740c1)
  • recognise unclosed/terminated ESP tags within tag attributes (28015ba)

2.15.0 (2020-05-11)

Features

  • improvements to esp tag recognition + some rebasing around esp tag extraction (2117466)
  • responsys RPL-like ESP tag recognition (543214a)

2.14.0 (2020-04-20)

Features

  • improvements to the ESP tag recognition (a1f5fe1)

2.13.0 (2020-04-19)

Fixed

  • more fixes for attribs[].attribValue[] (51f842b)
  • set tag key pureHTML correctly (90cbb4b)

Features

  • esp tags can come as attributes or be among attribute value tokens (28cfd40)

2.12.0 (2020-04-13)

Features

  • detect when HTML attribute's equal is missing and there's whitespace instead (cd74106)
  • extract new packages is-char-suitable-for-html-attr-name and is-html-attribute-closing (deafd48)
  • recognise HTML attributes with mismatching quotes and missing equal (7dedba1)

2.11.0 (2020-04-04)

Features

  • opts.tagCbLookahead and opts.charCbLookahead (4b88c33)
  • complete the correction for missing closing of an attribute (34aa959)
  • move lookahead contents from baked into node to separate input argument (be27d8e)
  • new comment kind, simplet (<!-->) (0734054)
  • recognise broken pattern "attribute name - equal - attribute name - equal" (9966e6c)
  • recognise conditional comments (both kinds) without brackets as long as mso exists (1afe369)
  • recognise expanded notation outlook conditional kind="not" comments (ffa4a0d)
  • recognise quoteless attribute tag endings (1a38f1d)
  • support ESP tokens inside HTML tags - nest them among attributes (114193c)

2.10.0 (2020-03-24)

Fixed

  • recognise "only"-kind closing tails with simple comment tails preceding (f0d3624)
  • second part of newly-added layer quotes - removing them (9378cb9)

Features

  • recognise conditional comment tags with wrong brackets (c399b92)
  • recognise mismatching quotes around HTML attributes (7e2818c)
  • recognise missing closing quotes on HTML attibutes (610e400)
  • recognise missing opening quotes on HTML attributes (41b85f0)
  • recognise repeated opening quotes on HTML attributes (55707d5)

2.9.0 (2020-03-16)

Fixed

  • add missing value, token.recognised on the broken tags (95fd011)
  • correct the incomplete simple opening HTML tag token (08620a6)
  • fix correct opening simple HTML comment ranges (04b88bc)
  • recognise repeated opening brackets in comment tags (6d22f81)

Features

  • algorithm improvements and some housekeeping (c479af9)
  • detect opening "if" kind comments without opening square bracket (607fc23)
  • improve the recognition to "not" kind conditional opening without closing bracket (23d6771)
  • recognise missing opening brackets (a182bb1)
  • recognise tags abruptly ended after tag name (6398167)

2.8.0 (2020-02-24)

Fixed

  • a donothing skip was missing (b79dafc)
  • make all tests pass (2c48aa1)
  • recognise rule chunks without curly braces (very likely broken) (f4245a3)
  • set the extracted selector's value to be trimmed in the end too (1efc1bd)
  • set XML closing tag kind (b1ab96f)

Features

  • add token.value (3a85934)
  • allow spaces between comment tokens (902e5cc)
  • improvements to startsComment detection function from util (9c0e0c0)
  • include more erroneous comment tag cases to be recognised (ea41247)
  • recognise conditional "everywhere except" comments (type="comment", kind="not") (e310798)
  • recognise more broken tags and broken comment tags (72e1e32)
  • recognise outlook/ie conditional comments (kind "only", type "comment") (5ef68fa)
  • recognise rogue closing quotes in css (3be0ec5)
  • simple comment tokens recognised both opening and closing (bd53904)

2.7.0 (2020-02-09)

Fixed

  • don't ping last undefined character to charCb (284b50c)
  • turn off styleStarts (def78e0)

Features

  • "rule" token type (e95c9ab)
  • extend loop range until length + 1 (8095bb9)
  • improvements to cdata tag recognition (b84491b)
  • single-layer at rules with nested whitespace (text) tokens (3bc51b5)
  • tighten up opts input check types (3b80e1d)

2.6.0 (2020-02-01)

Features

  • improvements to unclosed tag recognition (d861dd4)
  • missing HTML closing bracket (c858340)

2.5.0 (2020-01-01)

Fixed

  • whole attribute's value can't be an opening or closing ESP lump (051f2b6)

Features

  • further improvements to attribute value recognition (313f091)

2.4.0 (2019-12-27)

Features

  • add recognised attribute flag, "attribNameRecognised" (71cbe64)
  • improvements to the broken attribute recognition algorithm (408a3c6)
  • recognise attribute values not wrapped in quotes (1b3abcd)
  • recognise missing closing quotes of attribute values (c39dfde)
  • report tagName as lowercased, for consistency, ranges are still available (e69efc6)

2.3.1 (2019-12-21)

Fixed

  • false positive - repeated percentage within attribute's value pretending to be an ESP tag (ec36476)
  • html empty attributes logic fix (a3e507d)
  • recognise rgb() with empty brackets as value of an attr (69746ec)

2.3.0 (2019-12-14)

Features

  • algorithm improvements and more tests (4d6cf46)

2.2.0 (2019-12-09)

Fixed

  • add all h* tags to recognised list, fix the digit from being skipped (43cae4f)

Features

  • algorithm improvements, especially around esp literals (3183d4e)
  • html tag attribute recognition (5892120)

2.1.0 (2019-11-27)

Fixed

  • fix score calculation (3601ce2)
  • report doctype as recognised (6967044)

Features

  • eSP tag recognition improvements (5b1c0af)
  • improved broken cdata and doctype recognition (98880dc)
  • improvements to ESP tag recognition algorithm (f135f16)
  • report wrong case tag names as recognised (so that we can catch them later in emlint) (bbd56ec)

2.0.0 (2019-11-20)

Features

BREAKING CHANGES

  • options argument is now pushed by one place further

1.3.0 (2019-11-18)

Fixed

  • improve void tag detection by moving calculation to where tag name is calculated (5ea548f)

Features

  • don't end esp token as easily, ensure it's closed using a character from estimated tails (56d65be)
  • void tags are determined evaluating tag's name, not presence of slash (57d8b4c)

1.2.0 (2019-11-11)

Features

  • self-closing html tags (afaff20)
  • split test groups into files and report tag name for html tokens (014e792)

1.1.0 (2019-11-02)

Features

  • css token type (d617fb1)
  • doctype and xml recognition (3f92f64)
  • heuristic esp tag recognition (8ee7df7)
  • opts.reportProgressFunc (5cc4838)
  • recognise content within quotes (c6cbc97)
  • support esp code nested in other types and uneven count of quotes there (399f48b)
  • tap is-html-tag-opening to make algorithm more resilient (1c19b48)
  • init (61e53c3)

1.0.0 (2019-11-01)

  • First public release.