Wed, 26 Dec 2007

Abysmal sqlite performance: sqlite3_busy_timeout() sleeps for 1 sec on lock

A thread on wesnoth-dev woke my from my development slumber, and I marshalled the stats.wesnoth.org code together for checking into the repo (hopefully someone with time will work on it).

One thing I did before committing is run "make check", and indeed, the parallel test failed: this runs 10 "insert into db" CGIs at once. I vaguely remember having a problem with sqlite3 commands returning SQLITE_BUSY at random times, and writing this test to diagnose it.

My fix at the time was to do a 'sqlite3_busy_timeout(500)': wait for 500ms before returning SQLITE_BUSY, rather than returning it immediately. I felt aggrieved to have to explicitly tell the database this: given that it's non-trivial to test such a parallel-access path, it's insane that the default behaviour of sqlite3 is to error out because someone else is using the database.

Anyway, as my 'make check' failure reminded me, that's not enough. This time, I poked around for a minute or so, and sure enough, strace shows this:

fcntl64(4, F_SETLK64, {type=F_WRLCK, whence=SEEK_SET, start=1073741826, len=510}, 0xbfb6bac4)
	= -1 EAGAIN (Resource temporarily unavailable)
nanosleep({1, 0}, NULL)          = 0

A non-blocking attempt to lock the database, then a full second sleep. The Right Way is to do a blocking lock with a SIGALARM (but that is dangerous for libraries to use), but this code results in a 1 second latency: there is no way my 10 parallel programs can access the database before that 1/2 second timeout I set, so 9 of them fail.

The correct answer seems to be to write your own busy handler which sleeps for less. This is a horrible hack.

/* sqlite3_busy_timeout sleeps for a *second*.  What a piece of shit. */
static int busy(void *unused __attribute__((unused)), int count)
{
	usleep(50000);

	/* If we've been stuck for 1000 iterations (at least 50
	 * seconds), give up. */
	return (count < 1000);
}

...
	sqlite3_busy_handler(handle, busy, NULL);

Perhaps the more correct answer is to use something other than sqlite?


[/tech] permanent link