Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

vscode-textmate-languageservice

vsce-toolroom35MIT4.0.0TypeScript support: included

Textmate token-based language service for Visual Studio Code.

vscode, vscode-extension, textmate, grammar, language-features, language-service, language, lsp, parse, syntax, tokenization, tokenizer

readme

vscode-textmate-languageservice

🎉 This package has been adopted by the vsce-toolroom GitHub collective.

This package is in LTS mode & the Textmate technology is superseded by the tree-sitter symbolic-expression parser technology, as used in vscode-anycode.

Language service providers & APIs driven entirely by your Textmate grammar and one configuration file.

To use the API methods and tokenization / outline services, you only need a Textmate grammar. This can be from your extension or one of VS Code's built-in languages.

In order to properly generate full-blown language providers from this module, the Textmate grammar must also include the following features:

  • meta declaration scopes for block level declarations
  • variable assignment scopes differentiated between multiple and single
  • granular keyword control tokens with begin and end scopes

Installation

npm install vscode-textmate-languageservice

Browser support:

  • This package supports Webpack and ESBuild.
  • If you use a bundler, you need to set crypto as a external (commonjs crypto one in webpack). This allows the library to avoid polyfilling the node:crypto module.

Advisory:

This package is stable with browser compatibility (1.1.0). But I recommend you watch out for tree-sitter native integration into vscode (issue). Maintainable & with faster retokenization, it is a Holy Grail ...

Whereas this package depends on a well-written Textmate grammar and is a band aid of sorts.

If there is native vscode support for the language, find a Tree-sitter syntax online then suggest it in an Anycode issue. Otherwise, please open an issue on the community-maintained Treesitter syntax highlighter extension and someone might deal with it.

Setup

  • Language contribution and grammar contribution defined via contributes in the extension manifest (or textmate-languageservice-contributes).
  • Your grammar is bundled in the extension source code and is consumable by vscode-textmate (which can load PList XML, JSON or YAML grammars).
  • A configuration file is available in the extension, defaulting to ./textmate-configuration.json. You can also use textmate-languageservices property of package.json to map language ID to relative path.

Example language extension manifest - ./package.json:

{
    "name": "lua",
    "displayName": "Textmate language service for Lua",
    "description": "Lua enhanced support for Visual Studio Code",
    "version": "0.0.1",
    "publisher": "",
    "license": "",
    "engines": {
        "vscode": "^1.51.1"
    },
    "categories": [
        "Programming Languages"
    ],
    "contributes": {
        "languages": [{
            "id": "lua",
            "aliases": ["Lua"],
            "extensions": [".lua", ".moon", ".luau"],
            "configuration": "./language-configuration.json"
        }],
        "grammars": [{
            "language": "lua",
            "scopeName": "source.lua",
            "path": "./syntaxes/Lua.tmLanguage.json"
        }]
    }
}

Configuration

Create a JSON file named textmate-configuration.json in the extension directory. The file accepts comments and trailing commas.

If you only want to use the document and/or tokenization services, you can skip creating the file!

Textmate configuration fields:

  • assignment - optional (object)
    Collection of Textmate scope selectors for variable assignment scopes when including variable symbols:
    Properties:
    • separator: Token to separate multiple assignments (string)
    • single: Token to match single variable assignment. (string)
    • multiple: Token to match multiple variable assignment. (string)
  • declarations - optional (array)
    List of Textmate scope selectors for declaration token scopes.
  • dedentation - optional (array)
    List of Textmate tokens for dedented code block declarations (e.g. ELSE, ELSEIF).
    Tokens still need to be listed in indentation with the decrementing value -1.
  • exclude (string) VS Code glob pattern for files to exclude from workspace symbol search.
  • indentation - optional (object)
    Indentation level offset for Textmate token types (used to implement folding).
  • punctuation - optional (object)
    Collection of punctuation tokens with a significant effect on syntax providers. Properties:
    • continuation: Token scope selector for line continuation (to use in region matching). (string)
  • markers - optional (object)
    Stringified regular expression patterns for folding region comments.
    • start: Escaped regular expression for start region marker. (string)
    • end: Escaped regular expression for end region marker. (string) Properties:
  • symbols - optional (object)
    Map of document symbol tokens to their symbol kind (vscode.SymbolKind value).

Configuration examples

Template for textmate-configuration.json file:

{
  "assignment": {
    "single": "",
    "multiple": "",
    "separator": ""
  },
  "declarations": [],
  "dedentation": [
    "keyword.control.elseif.custom",
    "keyword.control.else.custom"
  ],
  "exclude": "{.modules,.includes}/**",
  "indentation": {
    "punctuation.definition.comment.begin.custom": 1,
    "punctuation.definition.comment.end.custom": -1,
    "keyword.control.begin.custom": 1,
    "keyword.control.end.custom": -1
  },
  "punctuation": {
    "continuation": "punctuation.separator.continuation.line.custom"
  },
  "markers": {
    "start": "^\\s*#?region\\b",
    "end": "^\\s*#?end\\s?region\\b"
  },
  "symbols": {
    "keyword.control.custom": 2,
    "entity.name.function.custom": 11
  }
}

An example configuration file that targets Lua:

{
  "assignment": {
    "single": "meta.assignment.variable.single.lua",
    "multiple": "meta.assignment.variable.group.lua",
    "separator": "punctuation.separator.comma.lua"
  },
  "declarations": [
    "meta.declaration.lua entity.name",
    "meta.assignment.definition.lua entity.name"
  ],
  "dedentation": [
    "keyword.control.elseif.lua",
    "keyword.control.else.lua"
  ],
  "exclude": "{.luarocks,lua_modules}/**",
  "indentation": {
    "punctuation.definition.comment.begin.lua": 1,
    "punctuation.definition.comment.end.lua": -1,
    "keyword.control.begin.lua": 1,
    "keyword.control.end.lua": -1
  },
  "markers": {
    "start": "^\\s*#?region\\b",
    "end": "^\\s*#?end\\s?region\\b"
  },
  "symbols": {
    "keyword.control.lua": 2,
    "entity.name.function.lua": 11
  }
}

Usage

TextmateLanguageService

The package exports a default class named TextmateLanguageService.

  • Parameter: languageId - Language ID of grammar contribution in VS Code (string).
  • Parameter: context? - Extension context from activate entrypoint export (vscode.ExtensionContext).

The library defaults to core behaviour when figuring out which scope name to use - last matching grammar or language wins. If the context parameter is supplied, the extension will first search contributions from the extension itself.

Language extension

Extension code sample - ./src/extension.ts:

import TextmateLanguageService from 'vscode-textmate-languageservice';

export async function activate(context: vscode.ExtensionContext) {
    const selector: vscode.DocumentSelector = 'lua';
    const textmateService = new TextmateLanguageService(selector, context);

    const foldingRangeProvider = await textmateService.createFoldingRangeProvider();
    const documentSymbolProvider = await textmateService.createDocumentSymbolProvider();
    const workspaceSymbolProvider = await textmateService.createWorkspaceSymbolProvider();
    const definitionProvider = await textmateService.createDefinitionProvider();

    context.subscriptions.push(vscode.languages.registerDocumentSymbolProvider(selector, documentSymbolProvider));
    context.subscriptions.push(vscode.languages.registerFoldingRangeProvider(selector, foldingRangeProvider));
    context.subscriptions.push(vscode.languages.registerWorkspaceSymbolProvider(workspaceSymbolProvider));
    context.subscriptions.push(vscode.languages.registerDefinitionProvider(selector, peekDefinitionProvider));
};

Tokenization

Extension code sample - ./src/extension.ts:

import TextmateLanguageService from 'vscode-textmate-languageservice';

export async function activate(context: vscode.ExtensionContext) {
    const selector: vscode.DocumentSelector = 'custom';
    const textmateService = new TextmateLanguageService('custom', context);
    const textmateTokenService = await textmateService.initTokenService();
    const textDocument = vscode.window.activeTextEditor!.document;
    const tokens = textmateTokenService.fetch(textDocument);
};

NB: If you would like to:

You can use the custom "textmate-languageservice-contributes" property in package.json:

{
    "textmate-languageservice-contributes": {
        "languages": [{
            "id": "typescript",
            "aliases": ["TypeScript"],
            "extensions": [".ts", ".tsx", ".cts", ".mts"]
        }],
        "grammars": [{
            "language": "typescript",
            "scopeName": "source.ts",
            "path": "./syntaxes/TypeScript.tmLanguage.json"
        }]
    }
}

API methods

Usage (example is for getting the token at the current cursor position):

const { getScopeInformationAtPosition } = TextmateLanguageService.api;

const editor = vscode.window.activeTextEditor;
const document = editor.document;
const position = editor.selection.active;

const token = await getScopeInformationAtPosition(document, position);

getScopeInformationAtPosition

getScopeInformationAtPosition(document: vscode.TextDocument, position: vscode.Position): Promise<TextmateToken>

Get token scope information at a specific position (caret line and character number).

  • Parameter: document - Document to be tokenized (vscode.TextDocument).
  • Parameter: position - Zero-indexed caret position of token in document (vscode.Position).
  • Returns: Promise resolving to token data for scope selected by caret position ({Promise<TextmateToken>}).

getScopeRangeAtPosition

getScopeRangeAtPosition(document: vscode.TextDocument, position: vscode.Position): vscode.Range;

Get matching scope range of the Textmate token intersecting a caret position.

  • Parameter: document - Document to be tokenized (vscode.TextDocument).
  • Parameter: position - Zero-indexed caret position to intersect with (vscode.Position).
  • Returns: Promise resolving to character and line number of the range (Promise<vscode.Range>).

getTokenInformationAtPosition

getTokenInformationAtPosition(document: vscode.TextDocument, position: vscode.Position): Promise<vscode.TokenInformation>;

VS Code compatible performant API for token information at a caret position.

  • Parameter: document - Document to be tokenized (vscode.TextDocument).
  • Parameter: position - Zero-indexed caret position of token in document (vscode.Position).
  • Returns: Promise resolving to token data compatible with VS Code (Promise<vscode.TokenInformation>).

getLanguageConfiguration

getLanguageConfiguration(languageId: string): LanguageDefinition;

Get the language definition point of a language mode identifier.

  • Parameter: languageId - Language ID as shown in brackets in "Change Language Mode" panel (string).
  • Returns: Language contribution as configured in source VS Code extension (LanguageDefinition).

getGrammarContribution

getGrammarConfiguration(languageId: string): GrammarLanguageDefinition;

Get the grammar definition point of a language mode identifier.

  • Parameter: languageId - Language identifier, shown in brackets in "Change Language Mode" panel (string).
  • Returns: Grammar contribution as configured in source VS Code extension (GrammarLanguageDefinition).

getLanguageContribution

getLanguageConfiguration(languageId: string): LanguageDefinition;

Get the language configuration of a language mode identifier.

  • Parameter: languageId - Language ID as shown in brackets in "Change Language Mode" panel (string).
  • Returns: Language contribution as configured in source VS Code extension (LanguageDefinition).

getContributorExtension

getContributorExtension(languageId: string): vscode.Extension<unknown> | void;

Get the VS Code Extension API entry of the extension that contributed a language mode identifier.

  • Parameter: languageId - Language identifier, shown in brackets in "Change Language Mode" panel (string).
  • Returns: Extension API instance that contributed the language - (vscode.Extension).

Use Oniguruma WASM buffer

This is the vscode-oniguruma build of Oniguruma written in C, compiled to WASM format with memory hooks to V8.

This is not streaming 🙁 but vscode libs must bundle WebAssembly deps so as to support web ecosystem.

import TextmateLanguageService from 'vscode-textmate-languageservice';
const onigurumaPromise = TextmateLanguageService.utils.getOniguruma();

changelog

Changelog

4.0.0

  • Mark package support as LTS mode instead of maintenance mode.
  • Add embedded language support.
  • Add JSONC support for configuration files.
  • Generate vscode.TextDocument mocks for document service output.
  • Switch LiteTextDocument to vscode.TextDocument across the package.
  • Allow the package to query embedded and builtin languages when extensions supply a vscode.ExtensionContext.
  • Change contribution logic to prioritise a supplied extension context instead of restricting contributions to that extension.
  • Export ContributorData - a utility for statically resolving language and grammar contributions.
  • Patch findLanguageIdFromScopeName grammar priority to match core behaviour.
  • Update API documentation to match 4.0.0.

3.0.1

  • Hotfix for type definitions missing in 3.0.0.
  • Smoke test types to ensure package build always includes type declarations.

3.0.0

  • [BREAKING] Rename api.getLanguageConfiguration to api.getLanguageContribution.
  • [BREAKING] Rename api.getGrammarConfiguration to api.getGrammarContribution.
  • Add getLanguageConfiguration API method to load vscode.LanguageConfiguration.
  • Add plaintext language tokenization and grammar resolution.
  • Hotfix for "unrecognized language" error for plaintext documents in API token methods.

2.0.0

  • The VSCE Toolroom open-source collective has adopted the Textmate language service project!
  • Redesigned the logo, inspired by the V8 engine and the Textmate osteopermum flower.

  • Languages can now be tokenized from built-in grammars as well as service-only grammars.

  • Marked TextmateLanguageService~context parameter as optional in the API types.
  • Marked the API from 1.0.0 as compatible with 1.55.0, not 1.51.0.
  • Provided community resolution to microsoft/vscode#109919 & microsoft/vscode#99356.

  • Implemented API methods in an api namespace for developer-friendly logic:

    • Add getTokenInformationAtPosition method for fast positional token polyfill: vscode.TokenInformation.
    • Add getScopeInformationAtPosition method to get Textmate token data: TextmateToken.
    • Add getScopeRangeAtPosition method to get token range: vscode.Range.
    • Add getLanguageConfiguration method for language configuration: LanguageDefinition.
    • Add getGrammarConfiguration method to get language grammar wiring: GrammarLanguageDefinition.
    • Add getContributorExtension method to get extension source of language ID: vscode.Extension.
  • Linted the Textmate scope parser correctly & automatically in the test pipeline.

  • Added getOniguruma to API utilities, a browser-ready non-streaming build of vscode-oniguruma.

1.2.1

Hotfix for typo in documentation: "textmate-language-contributes" -> "textmate-languageservice-contributes".

1.2.0

  • Add support for creation of tokenization or light document service.
    • TL:DR; swap the "contributes" key with a 'fake' "textmate-languageservice-contributes" key in package.json.
    • Now possible to wire up a fake language and grammar "contribution" to a package service.
  • Add service-only tests for TypeScript in the test suite.
  • Use TextmateLanguageService as global key instead of LSP in service workers.
    • "LSP" is an cross-process and IDE-agnostic message format/standard for language feature data.
    • This library's just a factory for language feature services in VS Code.
    • This change is not breaking thanks to Webpack.
  • Improve diff generation for error logging in the sample output validator that's used to test feature providers.
  • Add keywords to the NPM package's metadata for better search engine discovery.
  • Skip web testing of vscode.DefinitionProvider and vscode.WorkspaceSymbolProvider factory methods.

1.1.0

  • Bundle files using Webpack for performance boost.
  • Add browser production support (bundle onig.wasm using encoded-uint8array-loader & prevent reliance on fetch).
  • Restore typing declaration files for dependent extension consumers.
  • Fix broken Gitlab pipeline so we have CI testing again.
  • Fix line number collisions between entry symbols in the outline service.
  • Fix container name in symbol information output for the document symbol provider.
  • Upgrade vscode-textmate from 7.0.4 to 9.0.0 (microsoft/vscode-textmate#198).
  • Ignore test files from package before npm publish to reduce size by ~20%.
  • Add a web-only test harness for testing compatibility with dependent web extensions.
  • Add diff logging for JSON output sampling in the output sampler.
  • Improve test suite performance by 20% by removing dependencies & bundling.

NB:

  • I credited vscode-matlab contributors for writing some of the provider algorithms.
  • Apologies and thanks for the support!

1.0.0

  • Achieved web readiness by handling hashing. We use native hashing of file text contents to keep it fast.
  • Upgrade from SHA-1 algorithm (a famous collision-attack vector) and adopt stable SHA-256 alternatives.
  • Remove last external dependency (git-sha1) so we don't need a bundler.

1.0.0-rc-2

  • Fix the line number in the folding provider for top-level declaration folds after the first declaration.
  • Add browser readiness with a cost-benefit tradeoff... we now load onig.wasm (Textmate grammar regex parser) without streaming.
  • Remove any system dependencies in the test scripts. Plus the scripts use the CLIs better & are much cleaner.
  • Convert previous CI workflow pipeline format to Gitlab.

1.0.0-rc-1

  • vscode-textmate-languageservice codebase republished and migrated to Gitlab.
  • Significant changes to the shape of the API exports.
    • Usage: const lsp = new LSP('languageId', context)
    • API is now a collection of async create* factory functions. The names match their output interfaces in the VS Code API.
    • This means you will need to use await or .then to get the actual provider class..
    • It also means your activate function is better off as an async function - the code will be easier to read.
    • Services/generators/engines are now all created behind the scenes to reduce boilerplate.
  • Introduce top-level "textmate-languageservices" to support extension manifests with multiple configured languages.
    • This key can map language ID to config path, i.e. "textmate-languageservices": { "lua": "./syntaxes/lua-textmate-configuration.json" }.
    • (Without the setting, the package loads ./textmate-configuration.json targeting the language ID in the LSP constructor.)
  • Mostly removed Node dependencies in favour of native VS Code APIs. (Browser support SOON™?)
  • Fix external file search matching in the definition provider, so it now searches in any folder.
  • Invalidate service caches using an asynchronous hash engine - see #1.
  • Rewrite folding provider to remove performance overheads in header & block folding - see #2.
  • Fix line token incrementation for decremented lines in the tokenizer.
  • Fix for cache hashing in Textmate engine tokenization queue.
  • Add performance layer to Textmate scope selector parser to bypass the need for a WASM parser.

0.2.3

  • Fix performance of header algorithm.
  • Fix ending decrement of 1 line in folding provider top-level blocks.
  • Add local test execution support to test suite.

0.2.2

  • Fix performance of folding provider block dedent loop.
  • Port Textmate scope parser to TypeScript and remove caching overheads.

0.2.1

Boost tokenization performance by adding cache layers to Textmate scope selector logic.

0.2.0

  • Accept limited Textmate scope selectors in all configuration values.
  • Introduce array-string duplicity to all configuration values.
  • Add test suite for Textmate engine & VS Code providers.

Major breaking change - meta.parens does not match meta.function-call.parens in Textmate scope selectors.

0.1.1

Adds engine tokenization queue to improve performance for large files.

0.1.0

Initial version:

  • Core Textmate engine generating data collection from Textmate token list.
  • Includes five providers:
    • Document symbol provider
    • Folding provider
    • Peek definition provider
    • Table of Contents provider
    • Workspace symbol provider
  • Configurable by textmate-configuration.json.
  • Providers are exposed by a module index at ./src/index.ts.

Roadmap

  • 🚀 Adopt native fetch (Node 18.x) for loading WASM regexp parser for Oniguruma.
  • 🚀 Investigate rolling PEG parser for Textmate scope selectors in WASM format.
  • Semantic highlighting provider for parameters.
  • Semantic highlighting provider for classes or other "Table of Contents" items.
  • Semantic highlighting for variable assignment driven by token types and/or text.