Package detail

js-search

bvaughn265.2kMIT2.0.1

JS Search is an efficient, client-side search library for JavaScript and JSON objects

search, javascript, js, clientside, client-side, local, query

readme

Js Search: client-side search library

Js Search enables efficient client-side searches of JavaScript and JSON objects. It is ES5 compatible and does not require jQuery or any other third-party libraries.

Js Search began as a lightweight implementation of Lunr JS, offering runtime performance improvements and a smaller file size. It has since expanded to include a rich feature set- supporting stemming, stop-words, and TF-IDF ranking.

Here are some JS Perf benchmarks comparing the two search libraries. (Thanks to olivernn for tweaking the Lunr side for a better comparison!)

If you're looking for a simpler, web-worker optimized JS search utility check out js-worker-search.

Installation

You can install using either Bower or NPM like so:

npm install js-search
bower install js-search

Overview

At a high level you configure Js Search by telling it which fields it should index for searching and then add the objects to be searched.

For example, a simple use of JS Search would be as follows:

import * as JsSearch from 'js-search';

var theGreatGatsby = {
  isbn: '9781597226769',
  title: 'The Great Gatsby',
  author: {
    name: 'F. Scott Fitzgerald'
  },
  tags: ['book', 'inspirational']
};
var theDaVinciCode = {
  isbn: '0307474275',
  title: 'The DaVinci Code',
  author: {
    name: 'Dan Brown'
  },
  tags: ['book', 'mystery']
};
var angelsAndDemons = {
  isbn: '074349346X',
  title: 'Angels & Demons',
  author: {
    name: 'Dan Brown',
  },
  tags: ['book', 'mystery']
};

var search = new JsSearch.Search('isbn');
search.addIndex('title');
search.addIndex(['author', 'name']);
search.addIndex('tags')

search.addDocuments([theGreatGatsby, theDaVinciCode, angelsAndDemons]);

search.search('The');    // [theGreatGatsby, theDaVinciCode]
search.search('scott');  // [theGreatGatsby]
search.search('dan');    // [angelsAndDemons, theDaVinciCode]
search.search('mystery') // [angelsAndDemons, theDaVinciCode]

Tokenization

Tokenization is the process of breaking text (e.g. sentences) into smaller, searchable tokens (e.g. words or parts of words). Js Search provides a basic tokenizer that should work well for English but you can provide your own like so:

search.tokenizer = {
  tokenize( text /* string */ ) {
    // Convert text to an Array of strings and return the Array
  }
};

Stemming

Stemming is the process of reducing search tokens to their root (or "stem") so that searches for different forms of a word will still yield results. For example "search", "searching" and "searched" can all be reduced to the stem "search".

Js Search does not implement its own stemming library but it does support stemming through the use of third-party libraries.

To enable stemming, use the StemmingTokenizer like so:

var stemmer = require('porter-stemmer').stemmer;

search.tokenizer =
    new JsSearch.StemmingTokenizer(
        stemmer, // Function should accept a string param and return a string
        new JsSearch.SimpleTokenizer());

Stop Words

Stop words are very common (e.g. a, an, and, the, of) and are often not semantically meaningful. By default Js Search does not filter these words, but filtering can be enabled by using the StopWordsTokenizer like so:

search.tokenizer =
    new JsSearch.StopWordsTokenizer(
        new JsSearch.SimpleTokenizer());

By default Js Search uses a slightly modified version of the Google History stop words listed on www.ranks.nl/stopwords. You can modify this list of stop words by adding or removing values from the JsSearch.StopWordsMap object like so:

JsSearch.StopWordsMap.the = false; // Do not treat "the" as a stop word
JsSearch.StopWordsMap.bob = true;  // Treat "bob" as a stop word

Note that stop words are lower case and so using a case-sensitive sanitizer may prevent some stop words from being removed.

Configuring the search index

There are two search indices packaged with js-search.

Term frequency–inverse document frequency (or TF-IDF) is a numeric statistic intended to reflect how important a word (or words) are to a document within a corpus. The TF-IDF value increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. This helps to adjust for the fact that some words (e.g. and, or, the) appear more frequently than others.

By default Js Search supports TF-IDF ranking but this can be disabled for performance reasons if it is not required. You can specify an alternate ISearchIndex implementation in order to disable TF-IDF, like so:

// default
search.searchIndex = new JsSearch.TfIdfSearchIndex();

// Search index capable of returning results matching a set of tokens
// but without any meaningful rank or order.
search.searchIndex = new JsSearch.UnorderedSearchIndex();

Configuring the index strategy

There are three index strategies packaged with js-search.

PrefixIndexStrategy indexes for prefix searches. (e.g. the term "cat" is indexed as "c", "ca", and "cat" allowing prefix search lookups).

AllSubstringsIndexStrategy indexes for all substrings. In other word "c", "ca", "cat", "a", "at", and "t" all match "cat".

ExactWordIndexStrategy indexes for exact word matches. For example "bob" will match "bob jones" (but "bo" will not).

By default Js Search supports prefix indexing but this is configurable. You can specify an alternate IIndexStrategy implementation in order to disable prefix indexing, like so:

// default
search.indexStrategy = new JsSearch.PrefixIndexStrategy();

// this index strategy is built for all substrings matches.
search.indexStrategy = new JsSearch.AllSubstringsIndexStrategy();

// this index strategy is built for exact word matches.
search.indexStrategy = new JsSearch.ExactWordIndexStrategy();

changelog

Changelog

2.0.1

README update. (No code changes.)

2.0.0

Added es modules support for bundlers via "module" field and for node via "exports" field. Commonjs output is no longer provided. Entry point is UMD now. UMD/ESM are bundled with rollup which reduced minified bundle size twice from 17432 to 7759 bytes! Flow types are distributed with sources.

1.4.3

Don't inherit from the default Object for the token dictionary. (davidlukerice - #73)

1.4.2

Throw an error if Search is instantiated without the required uidFieldName constructor parameter.

1.4.1

1.4.0

Search uid field can now be an array (for nested/deep keys).

1.3.7

Fixed package.json to include correct files.

1.3.6

Performance tuning and removal of eager deopts.

Behind the scenes, this release also includes a rewrite from TypeScript to Flowtype. The external API should not be impacted by this rewrite however.

1.3.5

Fixed (hopefully) previous broken build.

1.3.4

Simple tokenizer now supports cyrillic. (De-Luxis - #21)

1.3.3

Fixed a bug in TfIdfSearchIndex that caused errors when indexing certain reserved keywords (eg "constructor").

1.3.2

Fixed tokenizer bug affecting IE <= 10 that caused prefix and substring token strategies incorrectly index terms.

1.3.1

Replaced array.push.call with array.concat in addDocuments. This avoids potential stack overflow for large documents arrays.

1.3.0

Search.addIndex supports Array parameter for nested values. Search indexing supports non-string values (eg numbers). Special thanks to @konradjurk for this release.

1.2.2

Small tweak to Node export check to avoid module is not defined error for browser-based users.

1.2.1

Modified export to better support Node environment (thanks to @scommisso).

1.2.0

Added ISearchIndex interface in order to support TF-IDF (enabled by default). Removed IPruningStrategy; it didn't seem like it added sufficient value to offset performance costs.

1.1.1

Udpated stop-words list to avoid filtering Object.prototype properties.

1.1.0

Refactored stemming and stop-word support to be based on ITokenizer decorators for better accuracy. Updated README examples with more info.

1.0.2

Added JsSearch module wrapper around library and renamed JsSearch class to Search. Added stemming support by way of the new StemmingSanitizerDecorator class.

1.0.1

Renamed WhitespaceTokenizer to SimpleTokenizer and added better support for punctuation. Added StopWordsIndexStrategyDecorator to support stop words filtering.

1.0.0

Initial release!