Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

stats-logscale

dallaylaen521MIT1.0.9TypeScript support: included

Approximate statistical analysis using logarithmic bins

descriptive, math, mean, median, percentile, statistics, univariate

readme

stats-logscale

A memory-efficient approximate statistical analysis tool using logarithmic binning.

Example: repeated setTimeout(0) execution times Example: repeated setTimeout(0) execution times

Description

  • data is split into bins (aka buckets), linear close to zero and logarithmic for large numbers (hence the name), thus maintaining desired absolute and relative precision;

  • can calculate mean, variance, median, moments, percentiles, cumulative distribution function (i.e. probability that a value is less than x), and expected values of arbitrary functions over the sample;

  • can generate histograms for plotting the data;

  • all calculated values are cached. Cache is reset upon adding new data;

  • (almost) every function has a "neat" counterpart which rounds the result to the shortest possible number within the precision bounds. E.g. foo.mean() // 1.0100047, but foo.neat.mean() // 1.01;

  • is (de)serializable;

  • can split out partial data or combine multiple samples into one.

Usage

Creating the sample container:

const { Univariate } = require( 'stats-logscale' );
const stat = new Univariate();

Specifying absolute and relative precision. The defaults are 10-9 and 1.001, respectivele. Less precision = less memory usage and faster data querying (but not insertion).

const stat = new Univariate({base: 1.01, precision: 0.001});

Use flat switch to avoid using logarithmic binning at all:

// this assumes the data is just integer numbers
const stat = new Univariate({precision: 1, flat: true});

Adding data points, wither one by one, or as (value, frequency) pairs. Strings are OK (e.g. after parsing user input) but non-numeric values will cause an exception:

stat.add (3.14);
stat.add ("Foo"); // Nope!
stat.add ("3.14 3.15 3.16".split(" "));
stat.addWeighted([[0.5, 1], [1.5, 3], [2.5, 5]]);

Querying data:

stat.count();           // number of data points
stat.mean();            // average
stat.stdev();           // standard deviation
stat.median();          // half of data is lower than this value
stat.percentile(90);    // 90% of data below this point
stat.quantile(0.9);     // ditto
stat.cdf(0.5);          // Cumulative distribution function, which means
                        // the probability that a data point is less than 0.5
stat.moment(power);     // central moment of an integer power
stat.momentAbs(power);  // < |x-<x>| ** power >, power may be fractional
stat.E( x => x\*x );    // expected value of an arbitrary function

Each querying primitive has a "neat" counterpart that rounds its output to the shortest possible decimal number in the respective bin:

stat.neat.mean();
stat.neat.stdev();
stat.neat.median();

Extract partial samples:

stat.clone( { min: 0.5, max: 0.7 } );
stat.clone( { ltrim: 1, rtrim: 1 });
    // cut off outer 1% of data
stat.clone( { ltrim: 1, rtrim: 1, winsorize: true }});
    // ditto but truncate outliers instead of discarding

Serialize, deserialize, and combine data from multiple sources

const str = JSON.stringify(stat);
// send over the network here
const copy = new Univariate (JSON.parse(str));

main.addWeighted( partialStat.getBins() );
main.addWeighted( JSON.parse(str).bins ); // ditto

Create histograms and plot data:

stat.histogram({scale: 768, count:1024});
    // this produces 1024 bars of the form
    // [ bar_height, lower_boundary, upper_boundary ]
    // The intervals are consecutive.
    // The bar heights are limited to 768.

stat.histogram({scale: 70, count:20})
    .map( x => stat.shorten(x[1], x[2]) + '\t' + '+'.repeat(x[0]) )
    .join('\n')
    // "Draw" a vertical histogram for text console
    // You'll use PNG in production instead, right? Right?

See the playground.

See also full documentation.

Performance

Data inserts are optimized for speed, and querying is cached where possible. The script example/speed.js can be used to benchmark the module on your system.

Memory usage for a dense sample spanning 6 orders of magnitude was around 1.6MB in Chromium, ~230KB for the data itself + ~1.2MB for the cache.

Bugs

Please report bugs and request features via the github bugtracker.

Copyright (c) 2022-2023 Konstantin Uvarin

This software is free software available under MIT license.

changelog

Mon Mar 11 2024 v1.0.9

- [ts] Add types for typescript users

Mon Dec 25 2023 v1.0.8

- [api] Add {flat: boolean} flag to new() to stay at linear binning for large values

Sun Jul 02 2023 v1.0.7

- Rewrite storage from {} to new Map(), get a major (~30%) insertion speedup
- Make sure code works in the browser when require()'d under webpack or browserify

Wed Jun 28 2023 v1.0.6

- [repo] Add more fields to package.json (no code changes really)

Wed Jun 28 2023 v1.0.5

- addWeighted() can now "forget" data points via negative weights
- Add winsorize:boolean parameter to clone() so that data points outside limits
  are truncated instead of discarded
- Add version: field to toJSON() to ensure we're actually saving/loading a proper Univariate object
- Add browser-only univariateToPng() function to the package
- Move webified files to 
  https://dallaylaen.github.io/stats-logscale-js/js/build/stats-logscale.js
  and https://dallaylaen.github.io/stats-logscale-js/js/build/stats-logscale.min.js
- Increase test coverage + better types in docs

1.0.4 Fri Jun 3 2022

- skewness, kurtosis, and momentStd(n) := moment(n) / stdev**n
- add `transform` param to `clone`
- add sumOf(x => x) which integrates arbitrary function over the sample

1.0.3 Mon May 16 2022

- fix bug in number shortening (again)

1.0.2 Mon May 16 2022

- fix bug in number shortening

1.0.0 Mon May 16 2022

- initial release with binning and stuff
- add, addWeighted
- clone, toJSON
- mean, median, stdev, percentile, moment, momentAbs, min, max, cdf
- histogram