In the process of implementing the initial data syncing logic for a mostly offline application I noticed my database abstraction taking a solid 2 minutes for inserting short of 750k rows into a single database table.
I set out to fix this bottleneck.
Problems and Issues ##
A new issue arose before i could even tackle the aforementioned performance problems: At first I used the spread syntax for a variadic amount of function parameters.
1class Database {
2 async create(...items) {}
3}
4new Database().create(...new Array(750_000));
This polluted the call stack and promptly caused a stack overflow:
1/home/teo/programming_wsl/blog_test/main.js:7
2new Database().create(...new Array(750_000));
3 ^
4
5RangeError: Maximum call stack size exceeded
6 at Object.<anonymous> (/home/teo/programming_wsl/blog_test/main.js:7:16)
7 at Module._compile (node:internal/modules/cjs/loader:1275:14)
8 at Module._extensions..js (node:internal/modules/cjs/loader:1329:10)
9 at Module.load (node:internal/modules/cjs/loader:1133:32)
10 at Module._load (node:internal/modules/cjs/loader:972:12)
11 at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:83:12)
12 at node:internal/main/run_main_module:23:47
13
14Node.js v19.9.0
Believe me this seems obvious in retrospect, but it took me a solid day of combing trough typescript source maps to find the cause.
Takeaway
I will never use the spread operator again - looking at you react.jsFirst Estimates and Benchmarking ##
Fixing this issue allowed me to call my create
function and pass the rows to
insert in. The first tests were nothing short of disappointing: 57 seconds for
inserting 500k rows, 1min 10 seconds for inserting 750k rows - this was too
slow.
1 async create(items) {
2 return new Promise((res, rej) => {
3 this.connection.serialize(() => {
4 this.connection.run("BEGIN TRANSACTION");
5 for (let i = 0; i < items.length; i++) {
6 try {
7 this.connection.run(
8 "INSERT INTO user (name, age) VALUES (?, ?)",
9 items[i]?.name,
10 items[i]?.age
11 );
12 } catch (e) {
13 this.connection.run("ROLLBACK TRANSACTION");
14 return rej(e);
15 }
16 }
17 this.connection.run("COMMIT");
18 return res();
19 });
20 });
21 }
My next idea was to get specific and reproducible numbers, thus i created a
benchmark, for the simplified create
implementation above, with differing
loads, starting with 10 rows and stopping at a million rows:
1const Database = require("./main.js");
2
3const DB = new Database();
4
5const amount = [10, 100, 1000, 10_000, 100_000, 1_000_000];
6const data = { name: "xnacly", age: 99 };
7
8describe("create", () => {
9 for (const a of amount) {
10 let d = new Array(a).fill(data);
11 test(`create-${a}`, async () => {
12 await DB.create(d);
13 });
14 }
15});
Measured times:
create | |
---|---|
10 rows | 2ms |
100 rows | 1ms |
1,000 rows | 13ms |
10,000 rows | 100ms |
100,000 rows | 1089ms |
1,000,000 rows | 11795ms |
Approaches ##
My first idea was to omit calls in the hot paths that obviously are contained in the create methods loop for inserting every row passed into it. After taking a look, i notice there are no heavy or even many operations here. How could I improve the database interaction itself? This was the moment I stumbled upon bulk inserts.
1INSERT INTO user (name, age) VALUES ("xnacly", 99);
The naive example implementation makes a database call for every row it wants to insert. Bulk inserting reduces this to a single call to the database layer by appending more value tuples to the above statement:
1INSERT INTO user (name, age) VALUES
2 ("xnacly", 99),
3 ("xnacly", 99),
4 ("xnacly", 99);
Using the above to implement a faster create
method as follows:
1 async createFast(items) {
2 if (!items.length) return Promise.resolve();
3 let insert = "INSERT INTO user (name, age) VALUES ";
4 insert += new Array(items.length).fill("(?,?)").join(",");
5 let params = new Array(items.length * 2);
6 let i = 0;
7 for (const item of items) {
8 params[i] = item.name;
9 params[i + 1] = item.age;
10 i += 2;
11 }
12 return new Promise((res, rej) => {
13 this.connection.serialize(() => {
14 this.connection.run("BEGIN TRANSACTION");
15 try {
16 this.connection.run(insert, params);
17 } catch (e) {
18 this.connection.run("ROLLBACK TRANSACTION");
19 return rej(e);
20 }
21 this.connection.run("COMMIT");
22 });
23 return res();
24 });
25 }
Extending our tests for createFast
:
1describe("createFast", () => {
2 for (const a of amount) {
3 let d = new Array(a).fill(data);
4 test(`createFast-${a}`, async () => {
5 await DB.createFast(d);
6 });
7 }
8});
create | createFast | Improvement | |
---|---|---|---|
10 rows | 2ms | 1ms | 2x |
100 rows | 1ms | 1ms | / |
1,000 rows | 13ms | 3ms | 4.3x |
10,000 rows | 100ms | 22ms | 4.5x |
100,000 rows | 1089ms | 215ms | 5.1x |
1,000,000 rows | 11795ms | 1997ms | 5.9x |
Info
The before benchmarks were done on the following system:
- CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz 4 cores
- RAM: 8,0 GB DDR3 1600 MHz
The system was under heavy load while benchmarking, thus the recorded times are still pretty slow.
Escaping from micro benchmarks into the real world ##
Applying the results from the micro benchmarks to my real world projects database layer resulted in significant runtime improvements, up to 66x faster.
create | createFast | Improvement | |
---|---|---|---|
10 rows | 7ms | 2ms | 3.5x |
100 rows | 48ms | 1ms | 48x |
1000 rows | 335ms | 9ms | 37.2x |
10000 rows | 2681ms | 80ms | 33.5x |
100000 rows | 25347ms | 390ms | 66x |