Package detail

murmurhash-native

royaltm9.5kMIT3.5.0

MurmurHash (32,64,128)bit native bindings for nodejs

murmurhash, murmurhash3, murmurhash128, murmurhash32, murmurhash2, murmurhash64, progressive hash, PMurHash, PMurHash128, hash

readme

MurmurHash bindings for node

This library provides Austin Appleby's non-cryptographic "MurmurHash" hashing algorithm functions in a few different flavours.

Key features:

blocking and asynchronous api interfaces
additional MurmurHash3 32 and 128 bit progressive implementations based on PMurHash
stream wrapper for progressive hasher with crypto.Hash-like bi-api interface
serializable state of the progressive hasher
BE or LE byte order variants of hashes
promise wrapper
prebuilt binaries for most standard system configurations
TypeScript declarations (docs)

Install:

There are prebuilt binaries available for painless installation on some Linuxes (x64), OS-X (x64) and Windows (x64 and x86) thanks to node-pre-gyp and node-pre-gyp-github.

npm install murmurhash-native

If the prebuilt release is not available for your system or nodejs version, the compilation from source will kick-in. For more information on building from source please consult this page.

If for some reason (e.g. an incompatible GLIBC) you might want to force building from source, type:

npm i murmurhash-native --build-from-source

To reinstall prebuilt binary (e.g. after switching between major nodejs versions):

npm rebuild --update-binary

TypeScript

murmurhash-native is ready for the TypeScript without any external declarations. However this module is node-specific package, if you're going to use it in TypeScript, do not forget to include @types/node and enable es2015 language features in your tsconfig.json.

Make a hash:

var murmurHash = require('murmurhash-native').murmurHash

murmurHash( 'hash me!' ) // 2061152078
murmurHash( new Buffer('hash me!') ) // 2061152078
murmurHash( 'hash me!', 0x12345789 ) // 1908692277
murmurHash( 'hash me!', 0x12345789, 'buffer' ) // <Buffer 71 c4 55 35>
murmurHash( 'hash me!', 0x12345789, 'hex' ) // '71c45535'
var buf = new Buffer('hash me!____')
murmurHash( buf.slice(0,8), 0x12345789, buf, 8 )
// <Buffer 68 61 73 68 20 6d 65 21 71 c4 55 35>

var murmurHash128x64 = require('murmurhash-native').murmurHash128x64
murmurHash128x64( 'hash me!' ) // 'c43668294e89db0ba5772846e5804467'

var murmurHash128x86 = require('murmurhash-native').murmurHash128x86
murmurHash128x86( 'hash me!' ) // 'c7009299985a5627a9280372a9280372'

// asynchronous

murmurHash( 'hash me!', function(err, hash) { assert.equal(hash, 2061152078) });

// output byte order (default is BE)

var murmurHashLE = require('murmurhash-native').LE.murmurHash;
murmurHashLE( 'hash me!', 0x12345789, 'buffer' ) // <Buffer 35 55 c4 71>
murmurHashLE( 'hash me!', 0x12345789, 'hex' ) // '3555c471'

These functions are awaiting your command:

murmurHash - MurmurHash v3 32bit
murmurHash32 - (an alias of murmurHash)
murmurHash128 - MurmurHash v3 128bit platform (x64 or x86) optimized
murmurHash128x64 - MurmurHash v3 128bit x64 optimized
murmurHash128x86 - MurmurHash v3 128bit x86 optimized
murmurHash64 - MurmurHash v2 64bit platform (x64 or x86) optimized
murmurHash64x64 - MurmurHash v2 64bit x64 optimized
murmurHash64x86 - MurmurHash v2 64bit x86 optimized

and they share the following signature:

murmurHash(data[, callback])
murmurHash(data, output[, offset[, length]][, callback])
murmurHash(data{string}, encoding|output_type[, seed][, callback])
murmurHash(data{Buffer}, output_type[, seed][, callback])
murmurHash(data, seed[, callback])
murmurHash(data, seed, output[, offset[, length]][, callback])
murmurHash(data, seed, output_type[, callback])
murmurHash(data, encoding, output_type[, callback])
murmurHash(data{string}, encoding, output[, offset[, length]][, callback])
murmurHash(data{string}, encoding, seed[, callback])
murmurHash(data{string}, encoding, seed, output[, offset[, length]][, callback])
murmurHash(data{string}, encoding, seed, output_type[, callback])

@param {string|Buffer} data - a byte-string to calculate hash from
@param {string} encoding - data string encoding, should be: "utf8", "ucs2", "ascii", "hex", "base64" or "binary"; "binary" by default
@param {Uint32} seed - murmur hash seed, 0 by default
@param {Buffer} output - a Buffer object to write hash bytes to; the same object will be returned
@param {number} offset - start writing into output at offset byte; negative offset starts from the end of the output buffer
@param {number} length - a number of bytes to write from calculated hash; negative length starts from the end of the hash; if absolute value of length is larger than the size of calculated hash, bytes are written only up to the hash size
@param {string} output_type - a string indicating return type:
- "number" - (default) for murmurHash32 an unsigned 32-bit integer,
```
     other hashes - hexadecimal string
```
- "hex" - hexadecimal string
- "base64" - base64 string
- "binary" - binary string
- "buffer" - a new Buffer object;
@param {string} encoding|output_type - data string encoding or a return type; because some valid return types are also valid encodings, the only values recognized here for output_type are:
- "number"
- "buffer"
@param {Function} callback - optional callback(err, result) if provided the hash will be calculated asynchronously using libuv worker queue, the return value in this instance will be undefined and the result will be provided to the callback function; Be carefull as reading and writing by multiple threads to the same memory may render undetermined results
@return {number|Buffer|String|undefined}

The order of bytes written to a Buffer or encoded string depends on function's endianness.

data and output arguments might reference the same Buffer object or buffers referencing the same memory (views).

There are additional namespaces, each for different variant of function endianness:

BE - big-endian (most significant byte first or network byte order)
LE - little-endian (least significant byte first)
platform - compatible with os.endianness()

Functions in the root namespace are big-endian.

Streaming and incremental api

The dual-api interface for progressive MurmurHash3 is available as a submodule:

var murmur = require('murmurhash-native/stream');
`

Incremental (a.k.a. progressive) api

var hash = murmur.createHash('murmurhash128x86');
hash.update('hash').digest('hex'); // '0d872bbf2cd001722cd001722cd00172'
hash.update(' me!').digest('hex'); // 'c7009299985a5627a9280372a9280372'

var hash = murmur.createHash('murmurhash128x86', {endianness: 'LE'});
hash.update('hash').digest('hex'); // 'bf2b870d7201d02c7201d02c7201d02c'
hash.update(' me!').digest('hex'); // '999200c727565a98720328a9720328a9'

Streaming api

var hash = murmur.createHash('murmurhash32', {seed: 123, encoding: 'hex', endianness: 'platform'});
fs.createReadStream('README.md').pipe(hash).pipe(process.stdout);

Serializable state

The incremental MurmurHash utilities may be serialized and later deserialized. One may also copy a hasher's internal state onto another. This way the hasher utility can be re-used to calculate a hash of some data with already known prefix.

var hash = murmur.createHash('murmurhash128x64').update('hash');
hash.digest('hex');                   // '4ab2e1e022f63e2e9add75dfcea2dede'

var backup = murmur.createHash(hash); // create a copy of a hash with the same internal state
backup.update(' me!').digest('hex');  // 'c43668294e89db0ba5772846e5804467'

hash.copy(backup)                     // copy hash's state onto the backup
    .update(' me!').digest('hex');    // 'c43668294e89db0ba5772846e5804467'

var serial = hash.serialize();        // serialize hash's state
serial == 'AAAAAAAAAAAAAAAAAAAAAGhzYWgAAAAAAAAAAAAAAFQAAAAEtd3X';
                                      // restore backup from serialized state
var backup = murmur.createHash('murmurhash128x64', {seed: serial});
backup.update(' me!').digest('hex');  // 'c43668294e89db0ba5772846e5804467'
                                      // finally
hash.update(' me!').digest('hex');    // 'c43668294e89db0ba5772846e5804467'

The dual-api with streaming is a javascript wrapper over the native module. The native incremental module is directly available at murmurhash-native/incremental.

See hasher.cc for full api description (and there's some crazy templating going on there...).

Promises

The native murmurHash functions run asynchronously if the last argument is a callback. There is however a promisify wrapper:

var mm = require('murmurhash-native/promisify')();
mm.murmurHash32Async( 'hash me!', 0x12345789 )
      .then(hash => { assert.equal(hash, 1908692277) });
// Promise { <pending> }

You may provide your own promise constructor:

var bluebird = require('bluebird');
var mm = require('murmurhash-native/promisify')(bluebird);
mm.murmurHash32Async( 'hash me!', 0x12345789 )
      .then(hash => { assert.equal(hash, 1908692277) });
// Promise {
//   _bitField: 0,
//   _fulfillmentHandler0: undefined,
//   _rejectionHandler0: undefined,
//   _promise0: undefined,
//   _receiver0: undefined }

Significant changes in 3.x

The most important change is full platform indifference of rendered output. In 2.x output hash as binary data provided via buffer was endian sensitive. Starting with 3.x the data written to output buffer is always MSB (byte) first.

The "hex", "base64" and "binary" output types has been (re)added, but this time with a sane definition.

So in this version the following is true on all platforms:

assert.strictEqual(murmurHash('foo', 'buffer').toString('hex'), murmurHash('foo', 0, 'hex'));
assert.strictEqual(murmurHash('foo', 'buffer').toString('base64'), murmurHash('foo', 0, 'base64'));

Significant changes in 2.x

The 1.x output types were very confusing. E.g. "hex" was just an equivalent of murmurHash(data, "buffer").toString("hex") which rendered incorrect hexadecimal number. So all the string output type encodings: "utf8", "ucs2", "ascii", "hex", "base64" and "binary" were completely removed in 2.0 as being simply useless.

The "number" output type has been adapted to all variants in a way more compatible with other murmurhash implementations. For 32bit hash the return value is an unsigned 32-bit integer (it was signed integer in 1.x) and for other hashes it's a hexadecimal number.

The "buffer" output type wasn't modified except that the default output is now "number" for all of the hashes.

Additionally when passing unsupported value to encoding or output_type argument the function throws TypeError.

Another breaking change is for the BE platforms. Starting with 2.0 endian-ness is recognized, so hashes should be consistent regardless of the cpu type.

Since v2.1 the callback argument was introduced.

Bugs, limitations, caveats

When working with Buffers, input data is not being copied, however for strings this is unavoidable. For strings with byte-length < 1kB the static buffer is provided to avoid mem-allocs.

The hash functions optimized for x64 and x86 produce different results.

Tested on Linux (x64), OS X (x64) and MS Windows (x64 and x86).

This version provides binaries for nodejs: v10, v11, v12, v13 and v14.

For binaries of murmurhash-native for previous versions of nodejs, use version 3.4.1 or 3.3.0 of this module.

changelog

v3.5.0

bump nan to 2.14.1, node-pre-gyp to 0.14.0
bump development dependencies
added binaries for node v13 and v14
dropped binaries for node pre v10

v3.4.1

restrict node to v6 or later

v3.4.0

bump nan to 2.13 and remove v8 deprecation warnings suppression introduced in v3.2.5
bump node-pre-gyp to 0.13 and node-pre-gyp-github to 1.4.3
bump development dependencies
bump typescript and typedoc dependencies
added tests and binaries for node v12
dropped support for node pre v6

v3.3.0

TypeScript declarations, documentation and tests
bump bluebird to 3.5.3, commander to 2.19.0 and tap to 12.1.0
added development dependencies: typescript, @types, typedoc and typedoc plugins

v3.2.5

bump node-pre-gyp to 0.11.0, nan to 2.11.1 and tap to 12.0.1
adapt async uncaughtException tests to tap 12
test and release binaries for node v11
suppress v8 deprecation warnings from nan

v3.2.4

bump node-pre-gyp to 0.10.3, commander to 2.17
test and release binaries for nodejs v10
replaced deprecated Buffer factory api in tests and benches with the class methods

v3.2.3

bump nan to 2.10, node-pre-gyp to 0.9.1, tap to 9, commander to 2.15
replaced deprecated synchronous Nan::Callback::Call with Nan::Call
removed redundant const Nan::NAN_METHOD_ARGS_TYPE
updated arguments to asynchronous Nan::Callback::Call
dropped support for node pre v4 (broken node-gyp 0.12.18 on XCode LLVM 8.1) on other systems it might still work though - not looking into it anymore

v3.2.2

bump nan to 2.7.0, node-pre-gyp to 0.6.39
bump development dependencies
replace deprecated Nan::ForceSet with Nan::DefineOwnProperty
test and release binaries for node v8 and v9
appveyor: pin npm version 5.3 for node v9 to workaround npm's issue #16649
npmrc: turn off package-lock

v3.2.1

bump nan to 2.6.2, node-pre-gyp to 0.6.34
bump development dependencies
test and release binaries for node v7
appveyor: pin npm versions

v3.2.0

bump nan to 2.3.5
removed strcasecmp dependency
asynchronous: static byte array for small strings added to the worker
incremental async: static byte array for small strings added to the hasher
incremental: endianness configurable via property and argument to the constructor
variants of murmur hash functions producing BE (MSB) or LE (LSB) results

v3.1.1

fix incremental async: ensure hasher is not GC'ed before worker completes
fix incremental async: prevent from copying state over busy target

v3.1.0

replaced MurmurHash3 implementation with PMurHash and PMurHash128
new ability to update incremental hashers asynchronously via libuv
stream implementation chooses sync vs async update depending on chunk size
test: ensure persistence under gc stress
bench: streaming

v3.0.4

test cases: incremental digest() method with buffer output
fix stream.js wrapper: missing support for offset and length in digest()

v3.0.3

improved node-pre-gyp configuration so only essential binaries are being packaged

v3.0.2

removed bundled dependencies

v3.0.1

facilitate installation with prebuilt native binaries
use "bindings" gem for finding native modules
backward compatibility testing of serialized data
c++ code cleanup: most of the precompiler macros replaces with type-safe constants
js code cleanup with jshint
remove iojs-3 from ci tests

v3.0.0

results always in network order MSB (byte)
restored output types: "hex" "base64" and "binary"
incremental MurmurHash 3: 32bit, 128bit x86 and x64
copyable and serializable state of incremental MurmurHash
stream wrapper for incremental MurmurHash

v2.1.0

new ability to calculate hashes asynchronously via libuv
ensure correct byte alignment while directly writing to a buffer
bench: asynchronous version
promisify wrapper

v2.0.0

output string encoding types removed
"number" output type is a hex number for 64 and 128bit hashes
"number" output type is the default output type for all hashes
consistent hash regardless of platform endian-ness
throws TypeError on incorrect encoding or output_type
second string argument interpreted as an output type or encoding
remove legacy pre v0.10 code

v1.0.2

bump nan to 2.3.3, fixes node v6 buld

v1.0.1

use nan converters instead of soon deprecated ->XValue()

v1.0.0

bump nan to 2.0.9, fixes build with iojs-3 and node v4

v0.3.1

bump nan to 1.8, fixes build with newset io.js

v0.3.0

output Buffer, offset and length arguments
use NODE_SET_METHOD macro to export functions

v0.2.1

bump nan to 1.6, remove polyfill
bench: compare with all crypto hashes

v0.2.0

default input encoding changed from "utf8" to "binary"
ensure default output type is "number" (32bit) or "buffer" (>32bit)
decode "utf8" string faster on node >= 0.10
handle some cases of 3 arguments better
bench: compare with md5/sha1
bench: string encoding argument

v0.1.1

fix handling of non-ascii encoding argument string