Mon, 12 Oct 2009

Fun With Bloom Filters

A few years back at a netconf, someone (Robert Olsson maybe? Jamal Salim?) got excited about Bloom Filters. It was my first exposure.

The idea is simple: imagine a zeroed bit array. To put a value in the filter you hash it to some bit, and set that bit. Later on, to check if something is in the filter, you hash it and check that bit. Of course this is a pretty poor filter: it never gives false negatives, but has at about {num entries} in {num bits} chance of giving false positives. The trick is to use more than one hash, and the chances of all those bits being set drops rapidly.

It can be used to accelerate lookups, but we never found a good use for it. Still, it sat in the back of my head for a few years until I came across a completely different use for the same idea.

TDB (the Trivial DataBase) is a simple key/value pair database in a file (think Berkley DB). It has a free list head and set of hash chain heads at the start, and each record is single-threaded (via a "next" entry) on one of these lists. My problem is that even though TDB supports transactions, there were reports of corruption on power failure (see next post!); we wanted a fast consistency check of the database. In particular, this was for ctdb: if the db is corrupt you just delete it and get a complete copy from the other nodes.

A single linear scan would be fastest, rather than seeking around the file. Checking each record is easy, but how do we check that it's in the right hash chain (or the free list), and that each record only appears once? The particular corrupt tdb I was given contained such an infinite loop, which is a nasty failure mode. The obvious thing to do is to seek through and record all the next pointers, and the actual record offsets, then sort the next pointers and see that the two lists match. But that involves a sort and would take 8 bytes per record (TDB is 32 bit, so that's 4 bytes for the next pointer and 4 bytes to remember the actual record offset).

How would we do this in fixed space, even though we don't know how many records there are? What if, instead, we allocate two Bloom filters for each hash chain (and one for the free list)? We put next pointers in the first Bloom filter, and actual located records in the second. At the end, the two should match!

But we can do better than this. Say we use 8 hashes, and 256 bits of bitmap. First off, if the 8 hashes of a value overlap already-set bits, it has no effect and we won't be able to tell if it's missing from the other filter. And if seven bits overlap others (so it only sets one unique bit) then we can't detect a "bad" value which sets that same bit and no other unique bits.

So instead of setting bits, we can flip bits in the bitmap. This means that we can detect a single extra value in one list unless it happens to cancel out its own bits (ie. the hash values all happen to form pairs), and if two values are different they'd need to hit precisely the same bits. This is astronomically unlikely (it's a bit more than 1 in 256! / (8! * 248!), but its still a very small number).

The best bit, of course is that you don't need two bitmaps: a single one will do. Since the two sets of values should be equal, it should be all zero bits when finished!

In practice, all the corrupt TDBs I've gathered have had much more gross errors. But it's nice to finally use Bloom's ideas! The code can be found in the CCAN repository.

[/tech] permanent link