Tue, 20 Oct 2009

ext3, corruption, and barrier=1

I mentioned in my previous post that we had seen tdb corruption (despite the carefully written syncing transaction code) when power failures occurred.

I mentioned (from my previous experience with trying to test virtio_blk) that ext3 doesn't use barriers by default, and that the filesystems should be mounted with "barrier=1". (The IBM engineers on the call were horrified that this wasn't the default: I remember the exact same feeling when I found out!).

I had my tdb_check() routine now, so I patched it into tdbtool and modified tdbtorture to take a -t ("do everything inside transactions") option: killing the box should still allow tdb_check() to pass when it came back. I thought using virtualization, but this isn't easy: killing kvm still causes outstanding writes to be completed by the host kernel (nested virtualization would work). So instead, it was time to use my physical test box.

First with standard ext3. Three times I started tdbtorture -t, then pulled the cord out the back. The first two times, sure enough, the tdb was corrupt. The third time, the root filesystem mounted read only and I fscked, rebooted, same thing, fscked again, rebooted happy. Sure enough, the tdb was corrupt (and one of my previous saved corrupted tdbs was lost, another was in lost+found). I should have forced a fsck on every reboot.

So I edited /etc/fstab to put barrier=1 in, and pulled the plug during tdbtorture again. Surprisingly, I got a journal error and r/o remount again, which shouldn't happen. Still, when I did another double-fsck, the tdb was clean! Two more times (no more fs corruption), and two more clean tdbs.

So it seems, lack of barriers was the culprit. But also note that tdbtorture was 4.8 seconds without barriers, 20 or 28 seconds with them (and this slowdown itself might make errors less likely). This is worse than the 10% that googling suggested, but then tdbtorture is pretty perverse. Three processes all doing three fsyncs per commit, and a commit happening about every 10 db operations.

[/tech] permanent link