micromark-extension-gfm-autolink-literal
micromark extensions to support GFM literal autolinks.
Contents
- What is this?
- When to use this
- Install
- Use
- API
- Bugs
- Authoring
- HTML
- CSS
- Syntax
- Types
- Compatibility
- Security
- Related
- Contribute
- License
What is this?
This package contains extensions that add support for the extra autolink syntax
enabled by GFM to micromark
.
GitHub employs different algorithms to autolink: one at parse time and one at
transform time (similar to how @mentions are done at transform time).
This difference can be observed because character references and escapes are
handled differently.
But also because issues/PRs/comments omit (perhaps by accident?) the second
algorithm for www.
, http://
, and https://
links (but not for email links).
As this is a syntax extension, it focuses on the first algorithm.
The second algorithm is performed by
mdast-util-gfm-autolink-literal
.
The html
part of this micromark extension does not operate on an AST and hence
can’t perform the second algorithm.
The implementation of autolink literal on github.com is currently buggy.
The bugs have been reported on cmark-gfm
.
This micromark extension matches github.com except for its bugs.
When to use this
This project is useful when you want to support autolink literals in markdown.
You can use these extensions when you are working with micromark
.
To support all GFM features, use
micromark-extension-gfm
instead.
When you need a syntax tree, combine this package with
mdast-util-gfm-autolink-literal
.
All these packages are used in remark-gfm
, which focusses on
making it easier to transform content by abstracting these internals away.
Install
This package is ESM only. In Node.js (version 16+), install with npm:
npm install micromark-extension-gfm-autolink-literal
In Deno with esm.sh
:
import {gfmAutolinkLiteral, gfmAutolinkLiteralHtml} from 'https://esm.sh/micromark-extension-gfm-autolink-literal@2'
In browsers with esm.sh
:
<script type="module">
import {gfmAutolinkLiteral, gfmAutolinkLiteralHtml} from 'https://esm.sh/micromark-extension-gfm-autolink-literal@2?bundle'
</script>
Use
import {micromark} from 'micromark'
import {
gfmAutolinkLiteral,
gfmAutolinkLiteralHtml
} from 'micromark-extension-gfm-autolink-literal'
const output = micromark('Just a URL: www.example.com.', {
extensions: [gfmAutolinkLiteral()],
htmlExtensions: [gfmAutolinkLiteralHtml()]
})
console.log(output)
Yields:
<p>Just a URL: <a href="http://www.example.com">www.example.com</a>.</p>
API
This package exports the identifiers
gfmAutolinkLiteral
and
gfmAutolinkLiteralHtml
.
There is no default export.
The export map supports the development
condition.
Run node --conditions development module.js
to get instrumented dev code.
Without this condition, production code is loaded.
gfmAutolinkLiteral()
Create an extension for micromark
to support GitHub autolink literal
syntax.
Parameters
Extension for micromark
that can be passed in extensions
to enable GFM
autolink literal syntax (Extension
).
gfmAutolinkLiteralHtml()
Create an HTML extension for micromark
to support GitHub autolink literal
when serializing to HTML.
Parameters
Extension for micromark
that can be passed in htmlExtensions
to support
GitHub autolink literal when serializing to HTML
(HtmlExtension
).
Bugs
GitHub’s own algorithm to parse autolink literals contains three bugs. A smaller bug is left unfixed in this project for consistency. Two main bugs are not present in this project. The issues relating to autolink literals are:
- GFM autolink extension (
www.
,https?://
parts): links don’t work when after bracket\ fixed here ✅ - GFM autolink extension (
www.
part): uppercase does not match on issues/PRs/comments\ fixed here ✅ - GFM autolink extension (
www.
part): the wordwww
matches\ present here for consistency
Authoring
It is recommended to use labels, either with a resource or a definition, instead of autolink literals, as those allow relative URLs and descriptive text to explain the URL in prose.
HTML
GFM autolink literals relate to the <a>
element in HTML.
See § 4.5.1 The a
element in the HTML spec for more info.
When an email autolink is used, the string mailto:
is prepended when
generating the href
attribute of the hyperlink.
When a www autolink is used, the string http://
is prepended.
CSS
As hyperlinks are the fundamental thing that makes the web, you will most
definitely have CSS for a
elements already.
The same CSS can be used for autolink literals, too.
GitHub itself does not apply interesting CSS to autolink literals. For any link, it currently (June 2022) uses:
a {
background-color: transparent;
color: #58a6ff;
text-decoration: none;
}
a:active,
a:hover {
outline-width: 0;
}
a:hover {
text-decoration: underline;
}
a:not([href]) {
color: inherit;
text-decoration: none;
}
Syntax
Autolink literals form with, roughly, the following BNF:
gfm_autolink_literal ::= gfm_protocol_autolink | gfm_www_autolink | gfm_email_autolink
; Restriction: the code before must be `www_autolink_before`.
; Restriction: the code after `.` must not be eof.
www_autolink ::= 3('w' | 'W') '.' [domain [path]]
www_autolink_before ::= eof | eol | space_or_tab | '(' | '*' | '_' | '[' | ']' | '~'
; Restriction: the code before must be `http_autolink_before`.
; Restriction: the code after the protocol must be `http_autolink_protocol_after`.
http_autolink ::= ('h' | 'H') 2('t' | 'T') ('p' | 'P') ['s' | 'S'] ':' 2'/' domain [path]
http_autolink_before ::= byte - ascii_alpha
http_autolink_protocol_after ::= byte - eof - eol - ascii_control - unicode_whitespace - ode_punctuation
; Restriction: the code before must be `email_autolink_before`.
; Restriction: `ascii_digit` may not occur in the last label part of the label.
email_autolink ::= 1*('+' | '-' | '.' | '_' | ascii_alphanumeric) '@' 1*(1*label_segment l_dot_cont) 1*label_segment
email_autolink_before ::= byte - ascii_alpha - '/'
; Restriction: `_` may not occur in the last two domain parts.
domain ::= 1*(url_ampt_cont | domain_punct_cont | '-' | byte - eof - ascii_control - ode_whitespace - unicode_punctuation)
; Restriction: must not be followed by `punct`.
domain_punct_cont ::= '.' | '_'
; Restriction: must not be followed by `char-ref`.
url_ampt_cont ::= '&'
; Restriction: a counter `balance = 0` is increased for every `(`, and decreased for every `)`.
; Restriction: `)` must not be `paren_at_end`.
path ::= 1*(url_ampt_cont | path_punctuation_cont | '(' | ')' | byte - eof - eol - space_or_tab)
; Restriction: must not be followed by `punct`.
path_punctuation_cont ::= trailing_punctuation - '<'
; Restriction: must be followed by `punct` and `balance` must be less than `0`.
paren_at_end ::= ')'
label_segment ::= label_dash_underscore_cont | ascii_alpha | ascii_digit
; Restriction: if followed by `punct`, the whole email autolink is invalid.
label_dash_underscore_cont ::= '-' | '_'
; Restriction: must not be followed by `punct`.
label_dot_cont ::= '.'
punct ::= *trailing_punctuation ( byte - eof - eol - space_or_tab - '<' )
char_ref ::= *ascii_alpha ';' path_end
trailing_punctuation ::= '!' | '"' | '\'' | ')' | '*' | ',' | '.' | ':' | ';' | '<' | '?' | '_' | '~'
The grammar for GFM autolink literal is very relaxed: basically anything except for whitespace is allowed after a prefix. To use whitespace characters and otherwise impossible characters, in URLs, you can use percent encoding:
https://example.com/alpha%20bravo
Yields:
<p><a href="https://example.com/alpha%20bravo">https://example.com/alpha%20bravo</a></p>
There are several cases where incorrect encoding of URLs would, in other
languages, result in a parse error.
In markdown, there are no errors, and URLs are normalized.
In addition, many characters are percent encoded
(sanitizeUri
).
For example:
www.a👍b%
Yields:
<p><a href="http://www.a%F0%9F%91%8Db%25">www.a👍b%</a></p>
There is a big difference between how www and protocol literals work compared to how email literals work. The first two are done when parsing, and work like anything else in markdown. But email literals are handled afterwards: when everything is parsed, we look back at the events to figure out if there were email addresses. This particularly affects how they interleave with character escapes and character references.
Types
This package is fully typed with TypeScript. It exports no additional types.
Compatibility
Projects maintained by the unified collective are compatible with maintained versions of Node.js.
When we cut a new major release, we drop support for unmaintained versions of
Node.
This means we try to keep the current release line,
micromark-extension-gfm-autolink-literal@^2
, compatible with Node.js 16.
This package works with micromark
version 3
and later.
Security
This package is safe. Unlike other links in CommonMark, which allow arbitrary protocols, this construct always produces safe links.
Related
micromark-extension-gfm
— support all of GFMmdast-util-gfm-autolink-literal
— support all of GFM in mdastmdast-util-gfm
— support all of GFM in mdastremark-gfm
— support all of GFM in remark
Contribute
See contributing.md
in micromark/.github
for ways to get
started.
See support.md
for ways to get help.
This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.