Package detail

vscode-textmate-languageservice

vsce-toolroom35MIT4.0.0

Textmate token-based language service for Visual Studio Code.

vscode, vscode-extension, textmate, grammar, language-features, language-service, language, lsp, parse, syntax, tokenization, tokenizer

readme

`vscode-textmate-languageservice`

🎉 This package has been adopted by the vsce-toolroom GitHub collective.

This package is in LTS mode & the Textmate technology is superseded by the tree-sitter symbolic-expression parser technology, as used in vscode-anycode.

Language service providers & APIs driven entirely by your Textmate grammar and one configuration file.

To use the API methods and tokenization / outline services, you only need a Textmate grammar. This can be from your extension or one of VS Code's built-in languages.

In order to properly generate full-blown language providers from this module, the Textmate grammar must also include the following features:

meta declaration scopes for block level declarations
variable assignment scopes differentiated between multiple and single
granular keyword control tokens with begin and end scopes

Installation

npm install vscode-textmate-languageservice

Browser support:

This package supports Webpack and ESBuild.
If you use a bundler, you need to set crypto as a external (commonjs crypto one in webpack). This allows the library to avoid polyfilling the node:crypto module.

Advisory:

This package is stable with browser compatibility (1.1.0). But I recommend you watch out for tree-sitter native integration into vscode (issue). Maintainable & with faster retokenization, it is a Holy Grail ...

Whereas this package depends on a well-written Textmate grammar and is a band aid of sorts.

If there is native vscode support for the language, find a Tree-sitter syntax online then suggest it in an Anycode issue. Otherwise, please open an issue on the community-maintained Treesitter syntax highlighter extension and someone might deal with it.

Setup

Language contribution and grammar contribution defined via contributes in the extension manifest (or textmate-languageservice-contributes).
Your grammar is bundled in the extension source code and is consumable by vscode-textmate (which can load PList XML, JSON or YAML grammars).
A configuration file is available in the extension, defaulting to ./textmate-configuration.json. You can also use textmate-languageservices property of package.json to map language ID to relative path.

Example language extension manifest - ./package.json:

{
    "name": "lua",
    "displayName": "Textmate language service for Lua",
    "description": "Lua enhanced support for Visual Studio Code",
    "version": "0.0.1",
    "publisher": "",
    "license": "",
    "engines": {
        "vscode": "^1.51.1"
    },
    "categories": [
        "Programming Languages"
    ],
    "contributes": {
        "languages": [{
            "id": "lua",
            "aliases": ["Lua"],
            "extensions": [".lua", ".moon", ".luau"],
            "configuration": "./language-configuration.json"
        }],
        "grammars": [{
            "language": "lua",
            "scopeName": "source.lua",
            "path": "./syntaxes/Lua.tmLanguage.json"
        }]
    }
}

Configuration

Create a JSON file named textmate-configuration.json in the extension directory. The file accepts comments and trailing commas.

If you only want to use the document and/or tokenization services, you can skip creating the file!

Textmate configuration fields:

assignment - optional (object)
Collection of Textmate scope selectors for variable assignment scopes when including variable symbols:
Properties:
- separator: Token to separate multiple assignments (string)
- single: Token to match single variable assignment. (string)
- multiple: Token to match multiple variable assignment. (string)
declarations - optional (array)
List of Textmate scope selectors for declaration token scopes.
dedentation - optional (array)
List of Textmate tokens for dedented code block declarations (e.g. ELSE, ELSEIF).
Tokens still need to be listed in indentation with the decrementing value -1.
exclude (string) VS Code glob pattern for files to exclude from workspace symbol search.
indentation - optional (object)
Indentation level offset for Textmate token types (used to implement folding).
punctuation - optional (object)
Collection of punctuation tokens with a significant effect on syntax providers. Properties:
- continuation: Token scope selector for line continuation (to use in region matching). (string)
markers - optional (object)
Stringified regular expression patterns for folding region comments.
- start: Escaped regular expression for start region marker. (string)
- end: Escaped regular expression for end region marker. (string) Properties:
symbols - optional (object)
Map of document symbol tokens to their symbol kind (vscode.SymbolKind value).

Configuration examples

Template for textmate-configuration.json file:

{
  "assignment": {
    "single": "",
    "multiple": "",
    "separator": ""
  },
  "declarations": [],
  "dedentation": [
    "keyword.control.elseif.custom",
    "keyword.control.else.custom"
  ],
  "exclude": "{.modules,.includes}/**",
  "indentation": {
    "punctuation.definition.comment.begin.custom": 1,
    "punctuation.definition.comment.end.custom": -1,
    "keyword.control.begin.custom": 1,
    "keyword.control.end.custom": -1
  },
  "punctuation": {
    "continuation": "punctuation.separator.continuation.line.custom"
  },
  "markers": {
    "start": "^\\s*#?region\\b",
    "end": "^\\s*#?end\\s?region\\b"
  },
  "symbols": {
    "keyword.control.custom": 2,
    "entity.name.function.custom": 11
  }
}

An example configuration file that targets Lua:

{
  "assignment": {
    "single": "meta.assignment.variable.single.lua",
    "multiple": "meta.assignment.variable.group.lua",
    "separator": "punctuation.separator.comma.lua"
  },
  "declarations": [
    "meta.declaration.lua entity.name",
    "meta.assignment.definition.lua entity.name"
  ],
  "dedentation": [
    "keyword.control.elseif.lua",
    "keyword.control.else.lua"
  ],
  "exclude": "{.luarocks,lua_modules}/**",
  "indentation": {
    "punctuation.definition.comment.begin.lua": 1,
    "punctuation.definition.comment.end.lua": -1,
    "keyword.control.begin.lua": 1,
    "keyword.control.end.lua": -1
  },
  "markers": {
    "start": "^\\s*#?region\\b",
    "end": "^\\s*#?end\\s?region\\b"
  },
  "symbols": {
    "keyword.control.lua": 2,
    "entity.name.function.lua": 11
  }
}

Usage

`TextmateLanguageService`

The package exports a default class named TextmateLanguageService.

Parameter: languageId - Language ID of grammar contribution in VS Code (string).
Parameter: context? - Extension context from activate entrypoint export (vscode.ExtensionContext).

The library defaults to core behaviour when figuring out which scope name to use - last matching grammar or language wins. If the context parameter is supplied, the extension will first search contributions from the extension itself.

Language extension

Extension code sample - ./src/extension.ts:

import TextmateLanguageService from 'vscode-textmate-languageservice';

export async function activate(context: vscode.ExtensionContext) {
    const selector: vscode.DocumentSelector = 'lua';
    const textmateService = new TextmateLanguageService(selector, context);

    const foldingRangeProvider = await textmateService.createFoldingRangeProvider();
    const documentSymbolProvider = await textmateService.createDocumentSymbolProvider();
    const workspaceSymbolProvider = await textmateService.createWorkspaceSymbolProvider();
    const definitionProvider = await textmateService.createDefinitionProvider();

    context.subscriptions.push(vscode.languages.registerDocumentSymbolProvider(selector, documentSymbolProvider));
    context.subscriptions.push(vscode.languages.registerFoldingRangeProvider(selector, foldingRangeProvider));
    context.subscriptions.push(vscode.languages.registerWorkspaceSymbolProvider(workspaceSymbolProvider));
    context.subscriptions.push(vscode.languages.registerDefinitionProvider(selector, peekDefinitionProvider));
};

Tokenization

Extension code sample - ./src/extension.ts:

import TextmateLanguageService from 'vscode-textmate-languageservice';

export async function activate(context: vscode.ExtensionContext) {
    const selector: vscode.DocumentSelector = 'custom';
    const textmateService = new TextmateLanguageService('custom', context);
    const textmateTokenService = await textmateService.initTokenService();
    const textDocument = vscode.window.activeTextEditor!.document;
    const tokens = textmateTokenService.fetch(textDocument);
};

NB: If you would like to:

just wire up tokenization or fast document text services to a Textmate grammar,
without (re-)contributing grammar and language configuration to VS Code,
or writing a full TextmateLanguageService provider configuration..

You can use the custom "textmate-languageservice-contributes" property in package.json:

{
    "textmate-languageservice-contributes": {
        "languages": [{
            "id": "typescript",
            "aliases": ["TypeScript"],
            "extensions": [".ts", ".tsx", ".cts", ".mts"]
        }],
        "grammars": [{
            "language": "typescript",
            "scopeName": "source.ts",
            "path": "./syntaxes/TypeScript.tmLanguage.json"
        }]
    }
}

API methods

Usage (example is for getting the token at the current cursor position):

const { getScopeInformationAtPosition } = TextmateLanguageService.api;

const editor = vscode.window.activeTextEditor;
const document = editor.document;
const position = editor.selection.active;

const token = await getScopeInformationAtPosition(document, position);

`getScopeInformationAtPosition`

getScopeInformationAtPosition(document: vscode.TextDocument, position: vscode.Position): Promise<TextmateToken>

Get token scope information at a specific position (caret line and character number).

Parameter: document - Document to be tokenized (vscode.TextDocument).
Parameter: position - Zero-indexed caret position of token in document (vscode.Position).
Returns: Promise resolving to token data for scope selected by caret position ({Promise<TextmateToken>}).

`getScopeRangeAtPosition`

getScopeRangeAtPosition(document: vscode.TextDocument, position: vscode.Position): vscode.Range;

Get matching scope range of the Textmate token intersecting a caret position.

Parameter: document - Document to be tokenized (vscode.TextDocument).
Parameter: position - Zero-indexed caret position to intersect with (vscode.Position).
Returns: Promise resolving to character and line number of the range (Promise<vscode.Range>).

`getTokenInformationAtPosition`

getTokenInformationAtPosition(document: vscode.TextDocument, position: vscode.Position): Promise<vscode.TokenInformation>;

VS Code compatible performant API for token information at a caret position.

Parameter: document - Document to be tokenized (vscode.TextDocument).
Parameter: position - Zero-indexed caret position of token in document (vscode.Position).
Returns: Promise resolving to token data compatible with VS Code (Promise<vscode.TokenInformation>).

`getLanguageConfiguration`

getLanguageConfiguration(languageId: string): LanguageDefinition;

Get the language definition point of a language mode identifier.

Parameter: languageId - Language ID as shown in brackets in "Change Language Mode" panel (string).
Returns: Language contribution as configured in source VS Code extension (LanguageDefinition).

`getGrammarContribution`

getGrammarConfiguration(languageId: string): GrammarLanguageDefinition;

Get the grammar definition point of a language mode identifier.

Parameter: languageId - Language identifier, shown in brackets in "Change Language Mode" panel (string).
Returns: Grammar contribution as configured in source VS Code extension (GrammarLanguageDefinition).

`getLanguageContribution`

getLanguageConfiguration(languageId: string): LanguageDefinition;

Get the language configuration of a language mode identifier.

Parameter: languageId - Language ID as shown in brackets in "Change Language Mode" panel (string).
Returns: Language contribution as configured in source VS Code extension (LanguageDefinition).

`getContributorExtension`

getContributorExtension(languageId: string): vscode.Extension<unknown> | void;

Get the VS Code Extension API entry of the extension that contributed a language mode identifier.

Parameter: languageId - Language identifier, shown in brackets in "Change Language Mode" panel (string).
Returns: Extension API instance that contributed the language - (vscode.Extension).

Use Oniguruma WASM buffer

This is the vscode-oniguruma build of Oniguruma written in C, compiled to WASM format with memory hooks to V8.

This is not streaming 🙁 but vscode libs must bundle WebAssembly deps so as to support web ecosystem.

import TextmateLanguageService from 'vscode-textmate-languageservice';
const onigurumaPromise = TextmateLanguageService.utils.getOniguruma();

changelog

Changelog

4.0.0

Mark package support as LTS mode instead of maintenance mode.
Add embedded language support.
Add JSONC support for configuration files.
Generate vscode.TextDocument mocks for document service output.
Switch LiteTextDocument to vscode.TextDocument across the package.
Allow the package to query embedded and builtin languages when extensions supply a vscode.ExtensionContext.
Change contribution logic to prioritise a supplied extension context instead of restricting contributions to that extension.
Export ContributorData - a utility for statically resolving language and grammar contributions.
Patch findLanguageIdFromScopeName grammar priority to match core behaviour.
Update API documentation to match 4.0.0.

3.0.1

Hotfix for type definitions missing in 3.0.0.
Smoke test types to ensure package build always includes type declarations.

3.0.0

[BREAKING] Rename api.getLanguageConfiguration to api.getLanguageContribution.
[BREAKING] Rename api.getGrammarConfiguration to api.getGrammarContribution.
Add getLanguageConfiguration API method to load vscode.LanguageConfiguration.
Add plaintext language tokenization and grammar resolution.
Hotfix for "unrecognized language" error for plaintext documents in API token methods.

2.0.0

The VSCE Toolroom open-source collective has adopted the Textmate language service project!
Redesigned the logo, inspired by the V8 engine and the Textmate osteopermum flower.
Languages can now be tokenized from built-in grammars as well as service-only grammars.
Marked TextmateLanguageService~context parameter as optional in the API types.
Marked the API from 1.0.0 as compatible with 1.55.0, not 1.51.0.
Provided community resolution to microsoft/vscode#109919 & microsoft/vscode#99356.
Implemented API methods in an api namespace for developer-friendly logic:
- Add getTokenInformationAtPosition method for fast positional token polyfill: vscode.TokenInformation.
- Add getScopeInformationAtPosition method to get Textmate token data: TextmateToken.
- Add getScopeRangeAtPosition method to get token range: vscode.Range.
- Add getLanguageConfiguration method for language configuration: LanguageDefinition.
- Add getGrammarConfiguration method to get language grammar wiring: GrammarLanguageDefinition.
- Add getContributorExtension method to get extension source of language ID: vscode.Extension.
Linted the Textmate scope parser correctly & automatically in the test pipeline.
Added getOniguruma to API utilities, a browser-ready non-streaming build of vscode-oniguruma.

1.2.1

Hotfix for typo in documentation: "textmate-language-contributes" -> "textmate-languageservice-contributes".

1.2.0

Add support for creation of tokenization or light document service.
- TL:DR; swap the "contributes" key with a 'fake' "textmate-languageservice-contributes" key in package.json.
- Now possible to wire up a fake language and grammar "contribution" to a package service.
Add service-only tests for TypeScript in the test suite.
Use TextmateLanguageService as global key instead of LSP in service workers.
- "LSP" is an cross-process and IDE-agnostic message format/standard for language feature data.
- This library's just a factory for language feature services in VS Code.
- This change is not breaking thanks to Webpack.
Improve diff generation for error logging in the sample output validator that's used to test feature providers.
Add keywords to the NPM package's metadata for better search engine discovery.
Skip web testing of vscode.DefinitionProvider and vscode.WorkspaceSymbolProvider factory methods.

1.1.0

Bundle files using Webpack for performance boost.
Add browser production support (bundle onig.wasm using encoded-uint8array-loader & prevent reliance on fetch).
Restore typing declaration files for dependent extension consumers.
Fix broken Gitlab pipeline so we have CI testing again.
Fix line number collisions between entry symbols in the outline service.
Fix container name in symbol information output for the document symbol provider.
Upgrade vscode-textmate from 7.0.4 to 9.0.0 (microsoft/vscode-textmate#198).
Ignore test files from package before npm publish to reduce size by ~20%.
Add a web-only test harness for testing compatibility with dependent web extensions.
Add diff logging for JSON output sampling in the output sampler.
Improve test suite performance by 20% by removing dependencies & bundling.

NB:

I credited vscode-matlab contributors for writing some of the provider algorithms.
Apologies and thanks for the support!

1.0.0

Achieved web readiness by handling hashing. We use native hashing of file text contents to keep it fast.
Upgrade from SHA-1 algorithm (a famous collision-attack vector) and adopt stable SHA-256 alternatives.
Remove last external dependency (git-sha1) so we don't need a bundler.

1.0.0-rc-2

Fix the line number in the folding provider for top-level declaration folds after the first declaration.
Add browser readiness with a cost-benefit tradeoff... we now load onig.wasm (Textmate grammar regex parser) without streaming.
Remove any system dependencies in the test scripts. Plus the scripts use the CLIs better & are much cleaner.
Convert previous CI workflow pipeline format to Gitlab.

1.0.0-rc-1

vscode-textmate-languageservice codebase republished and migrated to Gitlab.
Significant changes to the shape of the API exports.
- Usage: const lsp = new LSP('languageId', context)
- API is now a collection of async create* factory functions. The names match their output interfaces in the VS Code API.
- This means you will need to use await or .then to get the actual provider class..
- It also means your activate function is better off as an async function - the code will be easier to read.
- Services/generators/engines are now all created behind the scenes to reduce boilerplate.
Introduce top-level "textmate-languageservices" to support extension manifests with multiple configured languages.
- This key can map language ID to config path, i.e. "textmate-languageservices": { "lua": "./syntaxes/lua-textmate-configuration.json" }.
- (Without the setting, the package loads ./textmate-configuration.json targeting the language ID in the LSP constructor.)
Mostly removed Node dependencies in favour of native VS Code APIs. (Browser support SOON™?)
Fix external file search matching in the definition provider, so it now searches in any folder.
Invalidate service caches using an asynchronous hash engine - see #1.
Rewrite folding provider to remove performance overheads in header & block folding - see #2.
Fix line token incrementation for decremented lines in the tokenizer.
Fix for cache hashing in Textmate engine tokenization queue.
Add performance layer to Textmate scope selector parser to bypass the need for a WASM parser.

0.2.3

Fix performance of header algorithm.
Fix ending decrement of 1 line in folding provider top-level blocks.
Add local test execution support to test suite.

0.2.2

Fix performance of folding provider block dedent loop.
Port Textmate scope parser to TypeScript and remove caching overheads.

0.2.1

Boost tokenization performance by adding cache layers to Textmate scope selector logic.

0.2.0

Accept limited Textmate scope selectors in all configuration values.
Introduce array-string duplicity to all configuration values.
Add test suite for Textmate engine & VS Code providers.

Major breaking change - meta.parens does not match meta.function-call.parens in Textmate scope selectors.

0.1.1

Adds engine tokenization queue to improve performance for large files.

0.1.0

Initial version:

Core Textmate engine generating data collection from Textmate token list.
Includes five providers:
- Document symbol provider
- Folding provider
- Peek definition provider
- Table of Contents provider
- Workspace symbol provider
Configurable by textmate-configuration.json.
Providers are exposed by a module index at ./src/index.ts.

Roadmap

🚀 Adopt native fetch (Node 18.x) for loading WASM regexp parser for Oniguruma.
🚀 Investigate rolling PEG parser for Textmate scope selectors in WASM format.
✨ Semantic highlighting provider for parameters.
✨ Semantic highlighting provider for classes or other "Table of Contents" items.
✨ Semantic highlighting for variable assignment driven by token types and/or text.