Better INSERT-per-2nd show of SQLite

Better INSERT-per-2nd show of SQLite

Optimizing SQLite is difficult. Bulk-insert show of a C exertion tin change from Eighty five inserts per 2nd to complete Ninety six,000 inserts per 2nd!

Inheritance: We are utilizing SQLite arsenic portion of a desktop exertion. We person ample quantities of configuration information saved successful XML records-data that are parsed and loaded into an SQLite database for additional processing once the exertion is initialized. SQLite is perfect for this occupation due to the fact that it's accelerated, it requires nary specialised configuration, and the database is saved connected disk arsenic a azygous record.

Rationale: Initially I was disenchanted with the show I was seeing. It turns-retired that the show of SQLite tin change importantly (some for bulk-inserts and selects) relying connected however the database is configured and however you're utilizing the API. It was not a trivial substance to fig retired what each of the choices and strategies have been, truthful I idea it prudent to make this assemblage wiki introduction to stock the outcomes with Stack Overflow readers successful command to prevention others the problem of the aforesaid investigations.

The Experimentation: Instead than merely speaking astir show ideas successful the broad awareness (i.e. "Usage a transaction!"), I idea it champion to compose any C codification and really measurement the contact of assorted choices. We're going to commencement with any elemental information:

  • A 28 MB TAB-delimited matter record (about 865,000 information) of the absolute transit agenda for the metropolis of Toronto
  • My trial device is a Three.60 GHz P4 moving Home windows XP.
  • The codification is compiled with Ocular C++ 2005 arsenic "Merchandise" with "Afloat Optimization" (/Ox) and Favour Accelerated Codification (/Ot).
  • I'm utilizing the SQLite "Amalgamation", compiled straight into my trial exertion. The SQLite interpretation I hap to person is a spot older (Three.6.7), however I fishy these outcomes volition beryllium comparable to the newest merchandise (delight permission a remark if you deliberation other).

Fto's compose any codification!

The Codification: A elemental C programme that reads the matter record formation-by-formation, splits the drawstring into values and past inserts the information into an SQLite database. Successful this "baseline" interpretation of the codification, the database is created, however we gained't really insert information:

/************************************************************* Baseline code to experiment with SQLite performance. Input data is a 28 MB TAB-delimited text file of the complete Toronto Transit System schedule/route info from http://www.toronto.ca/open/datasets/ttc-routes/**************************************************************/#include <stdio.h>#include <stdlib.h>#include <time.h>#include <string.h>#include "sqlite3.h"#define INPUTDATA "C:\\TTC_schedule_scheduleitem_10-27-2009.txt"#define DATABASE "c:\\TTC_schedule_scheduleitem_10-27-2009.sqlite"#define TABLE "CREATE TABLE IF NOT EXISTS TTC (id INTEGER PRIMARY KEY, Route_ID TEXT, Branch_Code TEXT, Version INTEGER, Stop INTEGER, Vehicle_Index INTEGER, Day Integer, Time TEXT)"#define BUFFER_SIZE 256int main(int argc, char **argv) { sqlite3 * db; sqlite3_stmt * stmt; char * sErrMsg = 0; char * tail = 0; int nRetCode; int n = 0; clock_t cStartClock; FILE * pFile; char sInputBuf [BUFFER_SIZE] = "\0"; char * sRT = 0; /* Route */ char * sBR = 0; /* Branch */ char * sVR = 0; /* Version */ char * sST = 0; /* Stop Number */ char * sVI = 0; /* Vehicle */ char * sDT = 0; /* Date */ char * sTM = 0; /* Time */ char sSQL [BUFFER_SIZE] = "\0"; /*********************************************/ /* Open the Database and create the Schema */ sqlite3_open(DATABASE, &db); sqlite3_exec(db, TABLE, NULL, NULL, &sErrMsg); /*********************************************/ /* Open input file and import into Database*/ cStartClock = clock(); pFile = fopen (INPUTDATA,"r"); while (!feof(pFile)) { fgets (sInputBuf, BUFFER_SIZE, pFile); sRT = strtok (sInputBuf, "\t"); /* Get Route */ sBR = strtok (NULL, "\t"); /* Get Branch */ sVR = strtok (NULL, "\t"); /* Get Version */ sST = strtok (NULL, "\t"); /* Get Stop Number */ sVI = strtok (NULL, "\t"); /* Get Vehicle */ sDT = strtok (NULL, "\t"); /* Get Date */ sTM = strtok (NULL, "\t"); /* Get Time */ /* ACTUAL INSERT WILL GO HERE */ n++; } fclose (pFile); printf("Imported %d records in %4.2f seconds\n", n, (clock() - cStartClock) / (double)CLOCKS_PER_SEC); sqlite3_close(db); return 0;}

The "Power"

Moving the codification arsenic-is doesn't really execute immoderate database operations, however it volition springiness america an thought of however accelerated the natural C record I/O and drawstring processing operations are.

Imported 864913 information successful Zero.94seconds

Large! We tin bash 920,000 inserts per 2nd, supplied we don't really bash immoderate inserts :-)


The "Worst-Lawsuit-Script"

We're going to make the SQL drawstring utilizing the values publication from the record and invoke that SQL cognition utilizing sqlite3_exec:

sprintf(sSQL, "INSERT INTO TTC VALUES (NULL, '%s', '%s', '%s', '%s', '%s', '%s', '%s')", sRT, sBR, sVR, sST, sVI, sDT, sTM);sqlite3_exec(db, sSQL, NULL, NULL, &sErrMsg);

This is going to beryllium dilatory due to the fact that the SQL volition beryllium compiled into VDBE codification for all insert and all insert volition hap successful its ain transaction. However dilatory?

Imported 864913 information successful 9933.61seconds

Yikes! 2 hours and Forty five minutes! That's lone Eighty five inserts per 2nd.

Utilizing a Transaction

By default, SQLite volition measure all INSERT / Replace message inside a alone transaction. If performing a ample figure of inserts, it's advisable to wrapper your cognition successful a transaction:

sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);pFile = fopen (INPUTDATA,"r");while (!feof(pFile)) { ...}fclose (pFile);sqlite3_exec(db, "END TRANSACTION", NULL, NULL, &sErrMsg);

Imported 864913 information successful 38.03seconds

That's amended. Merely wrapping each of our inserts successful a azygous transaction improved our show to 23,000 inserts per 2nd.

Utilizing a Ready Message

Utilizing a transaction was a immense betterment, however recompiling the SQL message for all insert doesn't brand awareness if we utilizing the aforesaid SQL complete-and-complete. Fto's usage sqlite3_prepare_v2 to compile our SQL message erstwhile and past hindrance our parameters to that message utilizing sqlite3_bind_text:

/* Open input file and import into the database */cStartClock = clock();sprintf(sSQL, "INSERT INTO TTC VALUES (NULL, @RT, @BR, @VR, @ST, @VI, @DT, @TM)");sqlite3_prepare_v2(db, sSQL, BUFFER_SIZE, &stmt, &tail);sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);pFile = fopen (INPUTDATA,"r");while (!feof(pFile)) { fgets (sInputBuf, BUFFER_SIZE, pFile); sRT = strtok (sInputBuf, "\t"); /* Get Route */ sBR = strtok (NULL, "\t"); /* Get Branch */ sVR = strtok (NULL, "\t"); /* Get Version */ sST = strtok (NULL, "\t"); /* Get Stop Number */ sVI = strtok (NULL, "\t"); /* Get Vehicle */ sDT = strtok (NULL, "\t"); /* Get Date */ sTM = strtok (NULL, "\t"); /* Get Time */ sqlite3_bind_text(stmt, 1, sRT, -1, SQLITE_TRANSIENT); sqlite3_bind_text(stmt, 2, sBR, -1, SQLITE_TRANSIENT); sqlite3_bind_text(stmt, 3, sVR, -1, SQLITE_TRANSIENT); sqlite3_bind_text(stmt, 4, sST, -1, SQLITE_TRANSIENT); sqlite3_bind_text(stmt, 5, sVI, -1, SQLITE_TRANSIENT); sqlite3_bind_text(stmt, 6, sDT, -1, SQLITE_TRANSIENT); sqlite3_bind_text(stmt, 7, sTM, -1, SQLITE_TRANSIENT); sqlite3_step(stmt); sqlite3_clear_bindings(stmt); sqlite3_reset(stmt); n++;}fclose (pFile);sqlite3_exec(db, "END TRANSACTION", NULL, NULL, &sErrMsg);printf("Imported %d records in %4.2f seconds\n", n, (clock() - cStartClock) / (double)CLOCKS_PER_SEC);sqlite3_finalize(stmt);sqlite3_close(db);return 0;

Imported 864913 information successful Sixteen.27seconds

Good! Location's a small spot much codification (don't bury to call sqlite3_clear_bindings and sqlite3_reset), however we've much than doubled our show to Fifty three,000 inserts per 2nd.

PRAGMA synchronous = Disconnected

By default, SQLite volition intermission last issuing a OS-flat compose bid. This ensures that the information is written to the disk. By mounting synchronous = OFF, we are instructing SQLite to merely manus-disconnected the information to the OS for penning and past proceed. Location's a accidental that the database record whitethorn go corrupted if the machine suffers a catastrophic clang (oregon powerfulness nonaccomplishment) earlier the information is written to the platter:

/* Open the database and create the schema */sqlite3_open(DATABASE, &db);sqlite3_exec(db, TABLE, NULL, NULL, &sErrMsg);sqlite3_exec(db, "PRAGMA synchronous = OFF", NULL, NULL, &sErrMsg);

Imported 864913 information successful 12.41seconds

The enhancements are present smaller, however we're ahead to Sixty nine,600 inserts per 2nd.

PRAGMA journal_mode = Representation

See storing the rollback diary successful representation by evaluating PRAGMA journal_mode = MEMORY. Your transaction volition beryllium sooner, however if you suffer powerfulness oregon your programme crashes throughout a transaction you database might beryllium near successful a corrupt government with a partially-accomplished transaction:

/* Open the database and create the schema */sqlite3_open(DATABASE, &db);sqlite3_exec(db, TABLE, NULL, NULL, &sErrMsg);sqlite3_exec(db, "PRAGMA journal_mode = MEMORY", NULL, NULL, &sErrMsg);

Imported 864913 information successful Thirteen.50seconds

A small slower than the former optimization astatine Sixty four,000 inserts per 2nd.

PRAGMA synchronous = Disconnected and PRAGMA journal_mode = Representation

Fto's harvester the former 2 optimizations. It's a small much dangerous (successful lawsuit of a clang), however we're conscionable importing information (not moving a slope):

/* Open the database and create the schema */sqlite3_open(DATABASE, &db);sqlite3_exec(db, TABLE, NULL, NULL, &sErrMsg);sqlite3_exec(db, "PRAGMA synchronous = OFF", NULL, NULL, &sErrMsg);sqlite3_exec(db, "PRAGMA journal_mode = MEMORY", NULL, NULL, &sErrMsg);

Imported 864913 information successful 12.00seconds

Unbelievable! We're capable to bash Seventy two,000 inserts per 2nd.

Utilizing an Successful-Representation Database

Conscionable for kicks, fto's physique upon each of the former optimizations and redefine the database filename truthful we're running wholly successful RAM:

#define DATABASE ":memory:"

Imported 864913 information successful 10.94seconds

It's not ace-applicable to shop our database successful RAM, however it's awesome that we tin execute Seventy nine,000 inserts per 2nd.

Refactoring C Codification

Though not particularly an SQLite betterment, I don't similar the other char* duty operations successful the while loop. Fto's rapidly refactor that codification to walk the output of strtok() straight into sqlite3_bind_text(), and fto the compiler attempt to velocity issues ahead for america:

pFile = fopen (INPUTDATA,"r");while (!feof(pFile)) { fgets (sInputBuf, BUFFER_SIZE, pFile); sqlite3_bind_text(stmt, 1, strtok (sInputBuf, "\t"), -1, SQLITE_TRANSIENT); /* Get Route */ sqlite3_bind_text(stmt, 2, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT); /* Get Branch */ sqlite3_bind_text(stmt, 3, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT); /* Get Version */ sqlite3_bind_text(stmt, 4, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT); /* Get Stop Number */ sqlite3_bind_text(stmt, 5, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT); /* Get Vehicle */ sqlite3_bind_text(stmt, 6, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT); /* Get Date */ sqlite3_bind_text(stmt, 7, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT); /* Get Time */ sqlite3_step(stmt); /* Execute the SQL Statement */ sqlite3_clear_bindings(stmt); /* Clear bindings */ sqlite3_reset(stmt); /* Reset VDBE */ n++;}fclose (pFile);

Line: We are backmost to utilizing a existent database record. Successful-representation databases are accelerated, however not needfully applicable

Imported 864913 information successful Eight.94seconds

A flimsy refactoring to the drawstring processing codification utilized successful our parameter binding has allowed america to execute Ninety six,Seven-hundred inserts per 2nd. I deliberation it's harmless to opportunity that this is plentifulness accelerated. Arsenic we commencement to tweak another variables (i.e. leaf measurement, scale instauration, and many others.) this volition beryllium our benchmark.


Abstract (truthful cold)

I anticipation you're inactive with maine! The ground we began behind this roadworthy is that bulk-insert show varies truthful wildly with SQLite, and it's not ever apparent what adjustments demand to beryllium made to velocity-ahead our cognition. Utilizing the aforesaid compiler (and compiler choices), the aforesaid interpretation of SQLite and the aforesaid information we've optimized our codification and our utilization of SQLite to spell from a worst-lawsuit script of Eighty five inserts per 2nd to complete Ninety six,000 inserts per 2nd!


Make Scale past INSERT vs. INSERT past Make Scale

Earlier we commencement measuring SELECT show, we cognize that we'll beryllium creating indices. It's been urged successful 1 of the solutions beneath that once doing bulk inserts, it is sooner to make the scale last the information has been inserted (arsenic opposed to creating the scale archetypal past inserting the information). Fto's attempt:

Make Scale past Insert Information

sqlite3_exec(db, "CREATE INDEX 'TTC_Stop_Index' ON 'TTC' ('Stop')", NULL, NULL, &sErrMsg);sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);...

Imported 864913 information successful 18.13seconds

Insert Information past Make Scale

...sqlite3_exec(db, "END TRANSACTION", NULL, NULL, &sErrMsg);sqlite3_exec(db, "CREATE INDEX 'TTC_Stop_Index' ON 'TTC' ('Stop')", NULL, NULL, &sErrMsg);

Imported 864913 information successful Thirteen.66seconds

Arsenic anticipated, bulk-inserts are slower if 1 file is listed, however it does brand a quality if the scale is created last the information is inserted. Our nary-scale baseline is Ninety six,000 inserts per 2nd. Creating the scale archetypal past inserting information provides america Forty seven,Seven-hundred inserts per 2nd, whereas inserting the information archetypal past creating the scale provides america Sixty three,300 inserts per 2nd.


I'd gladly return recommendations for another situations to attempt... And volition beryllium compiling akin information for Choice queries shortly.


Respective suggestions:

  1. Option inserts/updates successful a transaction.
  2. For older variations of SQLite - See a little paranoid diary manner (pragma journal_mode). Location is NORMAL, and past location is OFF, which tin importantly addition insert velocity if you're not excessively disquieted astir the database perchance getting corrupted if the OS crashes. If your exertion crashes the information ought to beryllium good. Line that successful newer variations, the OFF/MEMORY settings are not harmless for exertion flat crashes.
  3. Taking part in with leaf sizes makes a quality arsenic fine (PRAGMA page_size). Having bigger leaf sizes tin brand reads and writes spell a spot sooner arsenic bigger pages are held successful representation. Line that much representation volition beryllium utilized for your database.
  4. If you person indices, see calling CREATE INDEX last doing each your inserts. This is importantly sooner than creating the scale and past doing your inserts.
  5. You person to beryllium rather cautious if you person concurrent entree to SQLite, arsenic the entire database is locked once writes are performed, and though aggregate readers are imaginable, writes volition beryllium locked retired. This has been improved slightly with the summation of a WAL successful newer SQLite variations.
  6. Return vantage of redeeming abstraction...smaller databases spell sooner. For case, if you person cardinal worth pairs, attempt making the cardinal an INTEGER PRIMARY KEY if imaginable, which volition regenerate the implied alone line figure file successful the array.
  7. If you are utilizing aggregate threads, you tin attempt utilizing the shared leaf cache, which volition let loaded pages to beryllium shared betwixt threads, which tin debar costly I/O calls.
  8. Don't usage !feof(file)!

I've besides requested akin questions present and present.


Attempt utilizing SQLITE_STATIC alternatively of SQLITE_TRANSIENT for these inserts.

SQLITE_TRANSIENT volition origin SQLite to transcript the drawstring information earlier returning.

SQLITE_STATIC tells it that the representation code you gave it volition beryllium legitimate till the question has been carried out (which successful this loop is ever the lawsuit). This volition prevention you respective allocate, transcript and deallocate operations per loop. Perchance a ample betterment.


SQLite is a fashionable prime for embedded databases and functions requiring section information retention. 1 communal show bottleneck encountered is dilatory INSERT operations, particularly once dealing with ample datasets. Optimizing the charge astatine which you tin insert information (INSERT-per-2nd) is important for exertion responsiveness and general ratio. This article explores assorted strategies to heighten SQLite's INSERT show, focusing connected applicable methods and codification examples.

Reaching Greater INSERT Show successful SQLite

Enhancing the INSERT show successful SQLite entails a operation of database configuration, SQL bid optimization, and appropriate transaction direction. SQLite, by default, is configured for reliability and information integrity, which tin typically bounds its compose velocity. We demand to strategically set definite settings and employment circumstantial coding practices to accomplish a noticeable addition successful INSERT-per-2nd. It's not conscionable astir throwing hardware astatine the job; frequently, the about important positive factors travel from optimizing however SQLite interacts with the information and the exertion.

Knowing Transaction Direction for Quicker Inserts

Transactions are captious once performing aggregate INSERT operations. With out a transaction, all INSERT message is handled arsenic a abstracted transaction, requiring SQLite to compose to disk and replace the database scale last all insertion. This is highly dilatory. By wrapping aggregate INSERT statements inside a azygous transaction, SQLite tin buffer the adjustments and compose them to disk successful a azygous cognition, importantly lowering the overhead. The show betterment tin beryllium melodramatic, frequently expanding INSERT velocity by orders of magnitude. Appropriate transaction direction is arguably the about impactful alteration you tin brand.

 -- Example of wrapping INSERT statements in a transaction BEGIN TRANSACTION; INSERT INTO mytable (column1, column2) VALUES ('value1', 'value2'); INSERT INTO mytable (column1, column2) VALUES ('value3', 'value4'); INSERT INTO mytable (column1, column2) VALUES ('value5', 'value6'); COMMIT TRANSACTION; 

See utilizing ready statements with parameterized queries once inserting the aforesaid information construction aggregate occasions. This prevents SQLite from having to parse the SQL message repeatedly, providing different show enhance. Initialization of an ArrayList palmy 1 action tin beryllium achieved by doing it this manner.

Methods to Maximize SQLite Insertion Charge

Past transaction direction, respective another methods tin importantly contact the SQLite insertion charge. These see adjusting SQLite's configuration parameters, optimizing your array schema, and utilizing businesslike SQL instructions. All of these elements performs a function, and a holistic attack that considers each points volition output the champion outcomes. Cautious information of these optimization strategies tin brand a significant quality, particularly once dealing with ample volumes of information.

Adjusting SQLite Configuration Parameters

SQLite gives respective configuration parameters that tin beryllium tweaked to better show. The PRAGMA synchronous mounting controls however strictly SQLite waits for information to beryllium written to disk. Mounting it to Disconnected (PRAGMA synchronous = Disconnected;) tin better compose velocity, however it besides will increase the hazard of information failure successful lawsuit of a scheme clang. A much balanced attack is to usage Average (PRAGMA synchronous = Average;), which supplies a bully compromise betwixt velocity and information integrity. Different crucial parameter is PRAGMA journal_mode, which controls however SQLite handles its compose-up log (WAL). Utilizing WAL manner (PRAGMA journal_mode = WAL;) tin importantly better concurrency and compose show, particularly once aggregate processes are penning to the database. These pragmas ought to beryllium fit earlier beginning the INSERT operations.

 PRAGMA synchronous = NORMAL; PRAGMA journal_mode = WAL; 

Schema Optimization and Indexing Concerns

The plan of your array schema besides influences INSERT show. Having excessively galore indexes, particularly connected often up to date columns, tin dilatory behind INSERT operations due to the fact that SQLite wants to replace the indexes last all insertion. See dropping pointless indexes earlier bulk INSERT operations and recreating them afterward. Besides, take due information varieties for your columns. Utilizing bigger information varieties than essential wastes abstraction and tin dilatory behind operations. For illustration, if a file lone wants to shop integers betwixt Zero and 255, usage INTEGER with a Cheque constraint alternatively of Matter. Effectual schema optimization goes manus-successful-manus with appropriate indexing methods to heighten show.

Optimization Method Statement Contact
Transactions Wrapper aggregate INSERTs successful Statesman/Perpetrate. Important addition successful INSERT-per-2nd.
Synchronous PRAGMA Set information compose affirmation. Average betterment, hazard of information failure if fit to Disconnected.
Diary Manner PRAGMA Utilizing WAL manner for quicker compose-up logging. Important betterment, particularly with concurrency.
Scale Direction Dropping and recreating indexes for bulk operations. Bully betterment, reduces overhead throughout bulk inserts.
"Untimely optimization is the base of each evil (oregon astatine slightest about of it) successful programming." - Donald Knuth

Utilizing batch INSERT statements, wherever imaginable, tin besides better show. Alternatively of inserting rows 1 astatine a clip, you tin insert aggregate rows with a azygous SQL message. SQLite's INSERT documentation gives elaborate explanations and examples.

 -- Example of batch INSERT INSERT INTO mytable (column1, column2) VALUES ('value1', 'value2'), ('value3', 'value4'), ('value5', 'value6'); 

Successful decision, optimizing SQLite for amended INSERT-per-2nd show requires a multi-faceted attack. By implementing transaction direction, tuning configuration parameters, and optimizing your schema, you tin importantly better the velocity astatine which you tin insert information into your SQLite database. Retrieve to benchmark your adjustments to guarantee that they are really offering a payment and to take the correct equilibrium betwixt show and information integrity. For additional speechmaking connected SQLite optimization, mention to the authoritative SQLite Optimization Strategies leaf. Present, spell away and make quicker SQLite functions!


SQLite Insert or Ignore

SQLite Insert or Ignore from Youtube.com

Previous Post Next Post

Formulario de contacto