Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

turbo-maker

AndrewShedov7kMIT1.0.51

Superfast, multithreaded document generator for MongoDB, operating through CLI.

mongodb, data-generator, mongodb-generator, stress-test-mongodb, multithreaded-db-tool, multithreaded-mongodb-tool, fast-mongodb-generator, super-speed, speed, performance, benchmark, benchmark-mongodb, stress-testing, load-testing, multithreading, CPU, CLI, turbo, big-data, synthetic-data, mongodb-utils

readme

Discord MIT License

turboMaker

Superfast, multithreaded document generator for MongoDB, operating through CLI.
Generates millions of documents at maximum speed, utilizing all CPU threads.

Suitable for

  • Creating big collections (exceeding 500,000,000 documents)
  • Generating synthetic data
  • Stress testing MongoDB
  • Performance benchmarking

Features

  1. Multithreading — each thread inserts documents in parallel. The generation speed of 1,000,000 documents with an average content size is 7 seconds (PC configuration: Intel i5-12600K, 80GB DDR4 RAM, Samsung 980 PRO 1TB SSD).
  2. Specify the number of threads for data generation to adjust CPU load, or set it to max to utilize all available threads.
  3. Document distribution across threads considering the remainder.
  4. Generation with custom data schemas through the generatingData function.
  5. Precise createdAt/updatedAt handling with timeStepMs.
  6. Batch inserts for enhanced performance.
  7. Integration with superMaker for generating random text, hashtags, words, dates, emails, id, url, arrays, booleans, etc.
  8. Progress bar in the console with percentage, speed, and statistics, along with other informative logs:


Generation of 1,000,000 documents in 7 seconds, filled with superMaker, with the following content.
PC configuration: Intel i5-12600K, 80GB DDR4 RAM, Samsung 980 PRO 1TB SSD.



Generation of 500,000,000 documents in 7 hr 10 min, filled with superMaker, with the following content.
When generating more than 10,000,000 documents, the speed may decrease periodically due to I/O and MongoDB-overhead.
PC configuration: Intel i5-12600K, 80GB DDR4 RAM, Samsung 980 PRO 1TB SSD.


Technologies used

  • worker_threads
  • SharedArrayBuffer → Int32Array → Atomics
  • perf_hooks.performance
  • os
  • process

Installation & Usage

  1. Install the package:
npm i turbo-maker
  1. Add a script in your package.json:
"scripts": {
  "turboMaker": "turbo-maker"
}
  1. In the root of the project, create a file — turbo-maker.config.js.
    You can start with a simple lite version.
    Examples of various configurations.

  2. Run from the project root:

npm run turboMaker

Explanation of the file structure — turbo-maker.config.js

Config options

Required fields:

uri: 'mongodb://127.0.0.1:27017',
db: 'crystalTest',
collection: 'posts',
numberThreads: 'max',
numberDocuments: 1_000_000,
batchSize: 10_000,
timeStepMs: 20

numberThreads

Accepts either a string or a number and sets the number of CPU threads used.

  • for value 'max', all threads are used.
  • if the number exceeds the actual thread count, all threads are used.

numberDocuments

Accepts a number, specifying how many documents to generate.

batchSize

Accepts a number of documents per batch inserted into the database.

  • the larger the batchSize, the fewer requests MongoDB makes, leading to faster insertions.
  • however, a very large batchSize can increase memory consumption.
  • the optimal value depends on your computer performance and the number of documents being inserted.

timeStepMs

Accepts a number and sets the time interval between createdAt timestamps (updatedAt repeats the value createdAt).

  • With a value of 0, a large number of documents will have the same createdAt due to the high generation speed, especially in multithreaded mode. To fine-tune the timeStepMs, use mongoChecker to check for duplicate createdAt fields in the generated documents.

function generatingData

To generate data for documents, you need to define a generatingData function. It can be fully customized.
With an empty function:

export async function generatingData() {}

and numberDocuments: 1_000_000, 1,000,000 empty documents will be generated, such as:

_id: ObjectId('68b2ab141b126e5d6f783d67')
document: null

The destructured function parameters:

export async function generatingData({
    createdAt,
    updatedAt
})

are not renamed, but you can override them inside the return statement:

return {
    createdCustom: createdAt,
    updatedCustom: updatedAt
};

For data generation, it's recommended to use superMaker.

A mini example of data generation with superMaker:

import { superMaker } from 'super-maker';

export const config = {
    uri: 'mongodb://127.0.0.1:27017',
    db: 'crystalTest',
    collection: 'posts',
    numberThreads: 'max',
    numberDocuments: 10_000,
    batchSize: 100,
    timeStepMs: 20
};

export async function generatingData({
    createdAt,
    updatedAt
}) {

    const {
        title,
        text,
    } = superMaker.lorem.fullText.generate({

        titleOptions: {
            sentenceMin: 0,
            sentenceMax: 1,
            wordMin: 4,
            wordMax: 7
        },

        textOptions: {
            sentenceMin: 1,
            sentenceMax: 12,
            wordMin: 4,
            wordMax: 10
        }
    });

    return {

        title,
        text,

        mainImage: superMaker.take.value({
            key: 'images.avatar'
        }),

        createdAt,
        updatedAt
    };
}

Simulation of CRYSTAL v2.0 operation using synthetic data generated with turboMaker and superMaker:

CRYSTAL v1.0 features

A Rust version of the generator is currently being developed, which performs much faster (up to 7.87x | 687%) according to the results of hybrid (CPU | I/O) testing.


SHEDOV.TOP CRYSTAL Discord Telegram X VK VK Video YouTube