Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

html-tag-validator

codeschool14.5kMIT1.6.0

PEG.js implementation of HTML tag validator

readme

html-tag-validator

This library takes some HTML source code, provided as a string, and generates an AST. An error will be generated describing what is malformed in the source document if the AST cannot be generated.

Note: this project is work-in-progress and is not fully spec-compliant.

See todo.md for plans for current and future releases.

The parser implements the basic components of the HTML 5 spec, such as:

  • doctype definition
  • HTML 5 elements
  • HTML 5 attributes
  • Enhanced validation for script, style, link and meta elements
  • Basic support for iframe elements
  • HTML comments

     <!--This is a comment. Comments are not displayed in the browser-->
  • Conditional comments

     <!--[if gte mso 12]>
       <style>
         td {
           mso-line-height-rule: exactly;
         }
       </style>
     <![endif]-->
  • Allowed <input> element type values and the attributes supported by each type

  • Hierarchal rules, such as: a properly-formed HTML 5 document should have a title element with contents within the head tag
  • Void elements

    <img src="cat.gif">
  • Void attributes

    <script async></script>
  • Normal attributes

    <p class="foo"></p>

Install

npm install html-tag-validator

Usage

The library exposes a single function that accepts two arguments: a string containing HTML, and a callback function. The callback should be in the form:

function (err, ast) {
  if (err) {
    // View the error generated by the parser
    console.log(err.message);
  } else {
    // View a the AST generated by the parser
    console.log(ast);
  }
}

Default syntax

var htmlTagValidator = require('html-tag-validator'),
  sampleHtml = "<html>" +
               "<head><title>hello world</title></head>" +
               "<body><p style='color: pink;'>my cool page</p></body>" +
               "</html>";

// Turn a HTML string into an AST
htmlTagValidator(sampleHtml, function (err, ast) {
  if (err) {
    throw err;
  } else {
    console.log(ast);
  }
});

Produces the following AST:

doctype:  null
document:
  -
    type:       element
    void:       false
    name:       html
    attributes: {}

    children:
      -
        type:       element
        void:       false
        name:       head
        attributes: {}

        children:
          -
            type:       title
            attributes: {}

            contents:   hello world
      -
        type:       element
        void:       false
        name:       body
        attributes: {}

        children:
          -
            type:       element
            void:       false
            name:       p
            attributes:
              style: color: pink;
            children:
              -
                type:     text
                contents: my cool page

Passing in options

Currently, you can provide custom attribute names to merge with the default values, custom validation rules, and global settings such as the output format for the validation messages.

var htmlTagValidator = require('html-tag-validator'),
  sampleHtml = "<html>" +
               "<head><title>hello world</title></head>" +
               "<body>" +
                 "<p *ngFor=\"let item of items\" (click)=\"func(item)\">" +
                  "my cool page" +
                 "</p>" +
               "</body>" +
               "</html>";

/*
* Allow Angular 2 style attributes on all elements. The key '_' means match
* on ANY tag, but you could also specific specific tag names (e.g.:
* 'my-custom-tag'). Custom attributes for existing HTML 5 tags will be merged
* with the official list of allowed tags. The key 'mixed' means normal or void
* attributes for the given tag name. You can also specify to target all 'normal'
* and/or 'void' attributes.
*
* This options object says the following:
* for all existing HTML 5 tag names '_' ...
* allow the following types of attribute names
*   1) *ngSomething
*   2) (something)  
*   3) [something]  
*   4) [(something)]
* for void (e.g.: async) attributes OR
* normal attributes (e.g.: checked="checked") ...
* in addition to the standard HTML 5 attributes for the element.
* Also, this adds a new normal (not self-closing) tag named
* template to support Angular 2 <template></template> tags.
*/
htmlTagValidator(sampleHtml, {
  settings: {
    // Set output format for validation error messages
    format: 'plain', // 'plain', 'html', or 'markdown'
    /* Setting verbose to true will generate an AST with additional
     * details such as whether tag attributes are unquoted */
    verbose: false, // default: false
    /* Set preserveCase to true to preserve the original case of tag and
     * attribute names so that you can support case-sensitive Angular 2
     * attribute names such as *ngFor and [ngModel] */
    preserveCase: true // default: false
  },
  tags: {
    normal: [ 'template' ]
  },
  attributes: {
    '_': {
      mixed: /^((\*ng)|(^\[[\S]+\]$)|(^\([\S]+\)$))|(^\[\([\S]+\)\]$)/
    }
  }
}, function (err, ast) {
  if (err) {
    throw err;
  } else {
    console.log(ast);
  }
});
var htmlTagValidator = require('html-tag-validator'),
  sampleHtml = "<html>" +
               "<head><title>hello world</title></head>" +
               "<body><p (click)='myCoolFunc()'>my cool page</p></body>" +
               "</html>";

/*
* Allow old-style HTML table attributes on specific elements.
*
* This options object adds some old HTML attributes for tables, to
* the 'table' and 'td' elements, in addition to the standard HTML 5
* attributes. Because the key is 'normal', these attributes are
* validated as normal attributes that should have a defined value.
* <td height="20px" width="30px">One</td>
* <td bgcolor="#000000">Two</td>
*/
htmlTagValidator(sampleHtml, {
  'settings': {
    'format': 'plain'
  },
  'attributes': {
    'table': {
      'normal': [
        'align', 'bgcolor', 'border', 'cellpadding', 'cellspacing',
        'frame', 'rules', 'summary', 'width'
      ]
    },
    'td': {
      'normal': [
        'height', 'width', 'bgcolor'
      ]
    }
  }
}, function (err, ast) {
  if (err) {
    throw err;
  } else {
    console.log(ast);
  }
});

Contributing

Once the dependencies are installed, start development with the following command:

grunt test - Automatically compile the parser and run the tests in test/index-spec.js.

grunt debug - Run tests with --inspect flag and extended output

grunt watch debug - Get extended output and start a file watcher.

Publishing to npm

Publishing master as normal works for pure html implementations, but sometimes a variation is needed, for example a PHP flavor that supports inline PHP tags.

Any variations should be on their own branch and named appropriately. These should be published separately as well, this can be done using npm tags. First change the version number in the package.json to include the language prefix, so for PHP that would be something like: 1.5.0-php then when publishing to npm do: npm publish --tag php. Doing this will allow you to reference this variation in your package.json like: "html-tag-validator": "1.5.0-php"

Note on validator variations

Anything that pertains to vanilla HTML should be implemented on master and merged into variation branches.

Writing tests

Tests refer to an HTML test file in test/html/ and the test name is a reference to the filename of the test file. For example super test 2 as a test name points to the file test/html/superTest2.html.

There are three options for the test helpers exposed by tree:

  • tree.ok(this, done) to assert that the test file successfully generates an AST
  • tree.equals(ast, this, done) to assert that the test file generates an AST that exactly matches ast
  • tree.error() to assert that a test throws an error
    • tree.error("This is the error message", this, done) assert an error message
    • tree.error({'line': 2}, this, done) assert an object of properties that each exist in the error

You can pass in an options object as the 2nd-to-last argument in each method:

  var options = {
    'settings': {
      'format': 'html'
    }
  };
  tree.ok(this, options, done);
// test/html/basicSelfClosing.html
it('basic self closing', function(done) {
  tree.ok(this, done);
});

// test/html/basicListItems.html
it('basic list items', function(done) {
  tree.error({
    'message': 'li is not a valid self closing tag',
    'line': 5
  }, this, done);
});

changelog

Change Log

All notable changes to this project will be documented in this file.

Unreleased

v1.6.0 - 2017-07-06

Added

  • Boolean attributes in all the allowed variations forms are now parsed and included in the AST
    • <input checked>
    • <input checked="true">
    • <input checked=""

Fixed

  • Allow unambiguous ampersands (&) in double-quoted attribute values. Ampersands are allowed EXCEPT when they come in the form of a named reference (e.g., &something;) where something is not a valid named reference from this list.

v1.5.0 - 2016-06-14

Added

  • Added preserveCase option to allow validation of Angular 2 templates. Here is a combination of settings and configuration that appears to work well for Angular 2.

    {
      settings: {
        preserveCase: true
      },
      tags: {
        normal: [ 'template' ]
      },
      attributes: {
        '_': {
          mixed: /^((\*ng)|(^\[[\S]+\]$)|(^\([\S]+\)$))|(^\[\([\S]+\)\]$)/
        }
      }
    }

v1.4.0 - 2016-03-08

Changed

  • Consolidated rules for text nodes, doctype tags, and whitespace to increase the parsing performance.

Fixed

  • The more self closing tags that are present in a document, the longer the document takes to parse, and this is causing the process to run out of memory while processing large documents with lots of self closing tags.

v1.2.0 - 2016-02-19

Added

  • Added verbose setting to create a verbose AST instead of the default AST. As of right now, this mode will tell you whether an attribute was quoted or unquoted but will be extended with additional information in the future.

v1.1.0 - 2015-08-07

Fixed

  • Definition for synchronous usage was incorrect in htmlTagValidator()

Added

  • Tests to verify that htmlTagValidator() can be called synchronously or asynchronously

v1.0.8 - 2015-05-14

Fixed

  • Encoding error in codex file that contained odd tab character

v1.0.7 - 2015-05-14

Changed

  • Updated markdown, html and plain output for validation error output

v1.0.6 - 2015-05-14

Fixed

  • Gruntfile now set to monitor all project files in grunt debug watcher

Changed

  • Main exported function for this library can now work synchronously or asynchronously
var validator = require('html-tag-validator');

// sync
var ast = validator("<p></p>", { 'settings': { 'format': 'html' } });

// async
validator("<p></p>", { 'settings': { 'format': 'html' } }, function (err, ast) {
  if (err) {
    throw err;
  } else {
    console.log(JSON.stringify(ast));
  }
});

Added

  • Global settings for error output format as 'plain', 'html', or 'markdown'
  • Escaping functions for identifiers and other values (for validation message generation)
  • Added conditional and conditions sections to attributes definitions in the options object, so that conditional rules on allowed attributes can be written easier (e.g.: if the type attribute is radio then allow attributes checked and required on an input element, in addition to the global and event attributes)
  • Testing: added ability to do deep equals against two HTML trees using tree.equals()

v1.0.5 - 2015-05-07

Fixed

  • Quoted attribute values did not follow HTML 5 spec
  • Unquoted attribute values did not follow HTML 5 spec
  • HTML parser utilities missing function findWhere() used by has() function
  • Malformed starting tags gave wrong error message

    <div>
      <p class
      </p>
    </div>
  • Having HTML or XML elements inside of a conditional comment caused parse errors

    <!--[if ie]>
      <style>
        .breaking {
          content: "whoops!";
        }
      </style>
    <![endif]-->
  • Grunt test and debug should only failOnError when running pegjs compiler

v1.0.4 - 2015-05-07

Added

  • Grunt commands test for running tests and debug for getting detailed test output and starting the file watcher

Changed

  • Got rid of the dependency on grunt-mocha-test to run the tests

v1.0.3 - 2015-05-07

Fixed

  • Internal utility methods such as textNode() and find() no longer modify build-in JavaScript objects such as Array and String

1.0.2 - 2015-05-07

Changed

  • HTML encode all error messages so they can be displayed on a webpage

v1.0.0 - 2015-05-07

Added

  • Breaking changes from 0.0.x. Check README for changes to core API.