Tue, 06 Jun 2006

Binary Searches and Mergesorts Still Broken?

Via Miguel de Icaza's blog I read a Google researcher saying Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken .

The problem is an overflow in adding the high and low indices to calculate the midpoint, apparently widely implemented as in Programming Pearls . I'm not a Java person, but fortunately he has a C fix, too.

This is classic programmer behavior: find a problem, patch it, declare it fixed. But the root of this problem has been missed, with two consequences: (1) people reading this are mislead as to the cause of this problem, and (2) the problem immediately behind it still lurks, at least in C.[1]

The suggested C fix is: "mid = ((unsigned) (low + high)) >> 1;". The better fix is to realize that valid indices into a C arraythis array are always unsigned by definition, and declare "low" and "high" as "unsigned int" in the first place. This means the empty array case needs to be handled, and a new "magic not-found return value" needs to be chosen. But generally, life's far simpler when you say what you mean.

Now the next problem becomes apparent: we've just shifted the overflow from 31 bits to 32 bits, giving ourselves a bugfix which will last ~10 months. low=1,200,000,000 high=1,200,000,100 works now, but low=2,400,000,000 high=2,400,000,100 still doesn't. There are several fixes for this, mine is here.

[1] For all I know, Java can't handle arrays of length > 31 bit, making this overkill, but C certainly can.


[/tech] permanent link