Package detail

kuromoji

takuyaa429.5kApache-2.00.1.2

JavaScript implementation of Japanese morphological analyzer

japanese, morphological analyzer, nlp, pos, pos tagger, tokenizer

readme

kuromoji.js

JavaScript implementation of Japanese morphological analyzer. This is a pure JavaScript porting of Kuromoji.

You can see how kuromoji.js works in demo site.

Usage

You can tokenize sentences with only 5 lines of code. If you need working examples, you can see the files under the demo or example directory.

Node.js

Install with npm package manager:

npm install kuromoji

Load this library as follows:

var kuromoji = require("kuromoji");

You can prepare tokenizer like this:

kuromoji.builder({ dicPath: "path/to/dictionary/dir/" }).build(function (err, tokenizer) {
    // tokenizer is ready
    var path = tokenizer.tokenize("すもももももももものうち");
    console.log(path);
});

Browser

You only need the build/kuromoji.js and dict/*.dat.gz files

Install with Bower package manager:

bower install kuromoji

Or you can use the kuromoji.js file and dictionary files from the GitHub repository.

In your HTML:

<script src="url/to/kuromoji.js"></script>

In your JavaScript:

kuromoji.builder({ dicPath: "/url/to/dictionary/dir/" }).build(function (err, tokenizer) {
    // tokenizer is ready
    var path = tokenizer.tokenize("すもももももももものうち");
    console.log(path);
});

API

The function tokenize() returns an JSON array like this:

[ {
    word_id: 509800,          // 辞書内での単語ID
    word_type: 'KNOWN',       // 単語タイプ(辞書に登録されている単語ならKNOWN, 未知語ならUNKNOWN)
    word_position: 1,         // 単語の開始位置
    surface_form: '黒文字',    // 表層形
    pos: '名詞',               // 品詞
    pos_detail_1: '一般',      // 品詞細分類1
    pos_detail_2: '*',        // 品詞細分類2
    pos_detail_3: '*',        // 品詞細分類3
    conjugated_type: '*',     // 活用型
    conjugated_form: '*',     // 活用形
    basic_form: '黒文字',      // 基本形
    reading: 'クロモジ',       // 読み
    pronunciation: 'クロモジ'  // 発音
  } ]

(This is defined in src/util/IpadicFormatter.js)

changelog

0.1.1 (2016-08-07)

Breaking Changes

dictionary directory path is changed from dist/dict/ to dict/, and browserified file kuromoji.js is moved from dist/browser/kuromoji.js to build/kuromoji.js (#13)

Bug Fixes

browserified kuromoji.js does not work in browser (#13)

0.1.0 (2016-08-06)

Breaking Changes

change binary format of cc.dat.gz (connection costs dictionary) (761eaf2, c64cc22)

Bug Fixes

word_position returns the real position in the text (#10)

Performance Improvements

read seed dictionary line-by-line to reduce memory consumption when building dictionary

Bump deps

update dependencies in package.json

Miscellaneous

separate mecab-ipadic seed dictionary to different repo as a npm package mecab-ipadic-seed (#12)
remove jsdoc directory from git repo (817c23e)
define deploy gulp task to publish jsdoc and demo as GitHub Pages (2d638aa)

0.0.5 (2015-11-19)

Bug Fixes

add error handling when DictionaryLoader try to load non-exist dictionaries (#7)
work with Atom editor (#8)

0.0.4 (2015-09-07)

Bump deps

update dependencies in package.json (#5)

Performance Improvements

use built-in zlib module instead of zlib.js on node.js (#6)

0.0.3 (2015-09-06)

Miscellaneous

introduce Travis CI, Coveralls.io and Code Climate
update README.md

0.0.2 (2014-12-04)

Miscellaneous

version to 0.0.2 because of failure to npm publish (1cdad3c)

Package detail

readme

kuromoji.js

Directory

Usage

Node.js

Browser

API

changelog

0.1.1 (2016-08-07)

Breaking Changes

Bug Fixes

0.1.0 (2016-08-06)

Breaking Changes

Bug Fixes

Performance Improvements

Bump deps

Miscellaneous

0.0.5 (2015-11-19)

Bug Fixes

0.0.4 (2015-09-07)

Bump deps

Performance Improvements

0.0.3 (2015-09-06)

Miscellaneous

0.0.2 (2014-12-04)

Miscellaneous