turboMaker
Superfast, multithreaded document generator for MongoDB, operating through CLI.
Generates millions of documents at maximum speed, utilizing all CPU threads.
Suitable for
- Creating big collections (exceeding 500,000,000 documents)
- Generating synthetic data
- Stress testing MongoDB
- Performance benchmarking
Features
- Multithreading — each thread inserts documents in parallel. The generation speed of 1,000,000 documents with an average content size is 7 seconds (PC configuration: Intel i5-12600K, 80GB DDR4 RAM, Samsung 980 PRO 1TB SSD).
- Specify the number of threads for data generation to adjust CPU load, or set it to
max
to utilize all available threads. - Document distribution across threads considering the remainder.
- Generation with custom data schemas through the
generatingData
function. - Precise
createdAt
/updatedAt
handling withtimeStepMs
. Batch
inserts for enhanced performance.- Integration with superMaker for generating random
text
,hashtags
,words
,dates
,emails
,id
,url
,arrays
,booleans
, etc. - Progress bar in the console with percentage, speed, and statistics, along with other informative logs:
Generation of 1,000,000 documents in 7 seconds, filled with superMaker, with the following content.
PC configuration: Intel i5-12600K, 80GB DDR4 RAM, Samsung 980 PRO 1TB SSD.
Generation of 500,000,000 documents in 7 hr 10 min, filled with superMaker, with the following content.
When generating more than 10,000,000 documents, the speed may decrease periodically due to I/O and MongoDB-overhead.
PC configuration: Intel i5-12600K, 80GB DDR4 RAM, Samsung 980 PRO 1TB SSD.
Technologies used
- worker_threads
- SharedArrayBuffer → Int32Array → Atomics
- perf_hooks.performance
- os
- process
Installation & Usage
- Install the package:
npm i turbo-maker
- Add a script in your package.json:
"scripts": {
"turboMaker": "turbo-maker"
}
In the root of the project, create a file — turbo-maker.config.js.
You can start with a simple lite version.
Examples of various configurations.Run from the project root:
npm run turboMaker
Explanation of the file structure — turbo-maker.config.js
Config options
Required fields:
uri: 'mongodb://127.0.0.1:27017',
db: 'crystalTest',
collection: 'posts',
numberThreads: 'max',
numberDocuments: 1_000_000,
batchSize: 10_000,
timeStepMs: 20
numberThreads
Accepts either a string
or a number
and sets the number of CPU threads used.
- for value
'max'
, all threads are used. - if the
number
exceeds the actual thread count, all threads are used.
numberDocuments
Accepts a number
, specifying how many documents to generate.
batchSize
Accepts a number
of documents per batch inserted into the database.
- the larger the batchSize, the fewer requests MongoDB makes, leading to faster insertions.
- however, a very large batchSize can increase memory consumption.
- the optimal value depends on your computer performance and the number of documents being inserted.
timeStepMs
Accepts a number
and sets the time interval between createdAt
timestamps (updatedAt
repeats the value createdAt
).
- With a value of
0
, a large number of documents will have the samecreatedAt
due to the high generation speed, especially in multithreaded mode. To fine-tune thetimeStepMs
, use mongoChecker to check for duplicatecreatedAt
fields in the generated documents.
function generatingData
To generate data for documents, you need to define a generatingData
function.
It can be fully customized.
With an empty function:
export async function generatingData() {}
and numberDocuments: 1_000_000
, 1,000,000 empty documents will be generated, such as:
_id: ObjectId('68b2ab141b126e5d6f783d67')
document: null
The destructured function parameters:
export async function generatingData({
createdAt,
updatedAt
})
are not renamed, but you can override them inside the return
statement:
return {
createdCustom: createdAt,
updatedCustom: updatedAt
};
For data generation, it's recommended to use superMaker.
A mini example of data generation with superMaker:
import { superMaker } from 'super-maker';
export const config = {
uri: 'mongodb://127.0.0.1:27017',
db: 'crystalTest',
collection: 'posts',
numberThreads: 'max',
numberDocuments: 10_000,
batchSize: 100,
timeStepMs: 20
};
export async function generatingData({
createdAt,
updatedAt
}) {
const {
title,
text,
} = superMaker.lorem.fullText.generate({
titleOptions: {
sentenceMin: 0,
sentenceMax: 1,
wordMin: 4,
wordMax: 7
},
textOptions: {
sentenceMin: 1,
sentenceMax: 12,
wordMin: 4,
wordMax: 10
}
});
return {
title,
text,
mainImage: superMaker.take.value({
key: 'images.avatar'
}),
createdAt,
updatedAt
};
}
Simulation of CRYSTAL v2.0 operation using synthetic data generated with turboMaker and superMaker: