<?xml version="1.0"?>
<!-- name="generator" content="blosxom/2.0" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
  <channel>
    <title>Rusty's Bleeding Edge Page   </title>
    <link>http://ozlabs.org/~rusty/index.cgi</link>
    <description>Rusty's Bleeding Edge Page</description>
    <language>en</language>

<item>
    <title>Finally, Rusty's Blog Moves to WordPress</title>
    <pubDate>Tue, 27 Oct 2009 01:06:00 GMT</pubDate>
    <link>http://ozlabs.org/~rusty/index.cgi/2009/10/27#2010-10-26</link>
    <description>
You won't see anything new here!  Try &lt;a href=&quot;http://rusty.ozlabs.org/&quot;&gt;http://rusty.ozlabs.org&lt;/a&gt; instead.</description>
</item>
<item>
    <title>SAMBA Coding and a Little Kernel</title>
    <pubDate>Sat, 24 Oct 2009 01:08:00 GMT</pubDate>
    <link>http://ozlabs.org/~rusty/index.cgi/2009/10/24#2009-10-23</link>
    <description>
&lt;p&gt;
So two weeks back was the Official Handing Over Of The SAMBA Team T-shirt!
Since then I have done my first serious push to the git tree, and received
spam from the build farm about it (false positives, AFAICT).
&lt;/p&gt;

&lt;a href=&quot;http://twitpic.com/knmz9&quot; title=&quot;Rusty officially joins the samba team on Twitpic&quot;&gt;&lt;img src=&quot;http://twitpic.com/show/thumb/knmz9.jpg&quot; width=&quot;150&quot; height=&quot;150&quot; alt=&quot;Rusty officially joins the samba team on Twitpic&quot; align=left&gt;&lt;/a&gt;

&lt;p&gt; I'm still maintinging virtio and the module and parameter code of
course.  But the kernel has slowly morphed into a complicated and
hairy place.  Formality has crept in, and the pile of prerequisites
grows higher (eg. git, checkpatch.pl, Signed-off-by).  This is
maturity, but it raises the question: when will some neat lean OS
without all this baggage come along? &lt;/p&gt;

&lt;p&gt; SMP, micro-optimizations, multithreading and extreme portability
are responsible for much of the added coding burdens, but also
hyper-distributed development means many coders shy away from changes
which would break APIs.  The suboptimality accretes and this method of
working becomes the new norm.  BUG_ON() for API misuse is now seen as
unduly harsh, but undefined APIs make the next change harder, and
WARN_ON() tends to stay around forever.  &lt;/p&gt;

&lt;p&gt; SAMBA has some brilliant ideas which coding a joy (talloc chief
among them, but there are other gems to be found).  Hell, it even has
a testsuite!  But of course it has its own issues; the SAMBA 3/4
split, lack of the kernel's massive human resources and the inevitable
code quality issues.  Ask me again in a few years to do a comparison...&lt;/p&gt;</description>
</item>
<item>
    <title>ext3, corruption, and barrier=1</title>
    <pubDate>Tue, 20 Oct 2009 14:58:00 GMT</pubDate>
    <link>http://ozlabs.org/~rusty/index.cgi/2009/10/20#2009-10-20</link>
    <description>
&lt;p&gt; I mentioned in my &lt;a
href=&quot;http://ozlabs.org/~rusty/index.cgi/tech/2009-10-12.html&quot;&gt;previous
post&lt;/a&gt; that we had seen tdb corruption (despite the carefully
written syncing transaction code) when power failures occurred.  &lt;/p&gt;

&lt;p&gt; I mentioned (from my previous experience with trying to test
virtio_blk) that ext3 doesn't use barriers by default, and that the
filesystems should be mounted with &quot;barrier=1&quot;.  (The IBM engineers on
the call were horrified that this wasn't the default: I remember the
exact same feeling when I found out!).  &lt;/p&gt;

&lt;p&gt; I had my tdb_check() routine now, so I patched it into tdbtool and
modified tdbtorture to take a -t (&quot;do everything inside transactions&quot;)
option: killing the box should still allow tdb_check() to pass when it
came back.  I thought using virtualization, but this isn't easy:
killing kvm still causes outstanding writes to be completed by the
host kernel (nested virtualization would work).  So instead, it was time
to use my physical test box. &lt;/p&gt;

&lt;p&gt; First with standard ext3.  Three times I started tdbtorture -t,
then pulled the cord out the back.  The first two times, sure enough,
the tdb was corrupt.  The third time, the root filesystem mounted read
only and I fscked, rebooted, same thing, fscked again, rebooted happy.
Sure enough, the tdb was corrupt (and one of my previous saved
corrupted tdbs was lost, another was in lost+found).  I should have forced
a fsck on every reboot.
&lt;/p&gt;

&lt;p&gt; So I edited /etc/fstab to put barrier=1 in, and pulled the plug during
tdbtorture again.  Surprisingly, I got a journal error and r/o remount again,
which shouldn't happen.  Still, when I did another double-fsck, the tdb was
clean!  Two more times (no more fs corruption), and two more clean tdbs.
&lt;/p&gt;

&lt;p&gt; So it seems, lack of barriers was the culprit.  But also note that
tdbtorture was 4.8 seconds without barriers, 20 or 28 seconds with them
(and this slowdown itself might make errors less likely).
This is worse than the 10% that &lt;a href=&quot;http://hightechsorcery.com/2008/10/evaluating-performance-ext3-using-write-barriers-and-write-caching&quot;&gt;googling suggested&lt;/a&gt;, but then tdbtorture is pretty perverse.  Three processes all doing
three fsyncs per commit, and a commit happening about every 10 db operations.
&lt;/p&gt;
</description>
</item>
<item>
    <title>Fun With Bloom Filters</title>
    <pubDate>Mon, 12 Oct 2009 00:30:00 GMT</pubDate>
    <link>http://ozlabs.org/~rusty/index.cgi/2009/10/12#2009-10-12</link>
    <description>
&lt;p&gt; A few years back at a netconf, someone (Robert Olsson maybe? Jamal
Salim?) got excited about &lt;a href=&quot;http://en.wikipedia.org/wiki/Bloom_filter&quot;&gt;Bloom Filters&lt;/a&gt;.  It was my first exposure.
&lt;/p&gt;

&lt;p&gt;
The idea is simple: imagine a zeroed bit array.  To put a value in the
filter you hash it to some bit, and set that bit.  Later on, to check
if something is in the filter, you hash it and check that bit.  Of
course this is a pretty poor filter: it never gives false negatives,
but has at about {num entries} in {num bits} chance of giving false
positives.  The trick is to use more than one hash, and the chances of
all those bits being set drops rapidly.
&lt;/p&gt;

&lt;p&gt;
It can be used to accelerate lookups, but we never found a good use for it.
Still, it sat in the back of my head for a few years until I came across a
completely different use for the same idea.
&lt;/p&gt;

&lt;p&gt;
TDB (the Trivial DataBase) is a simple key/value pair database in
a file (think Berkley DB).  It has a free list head and set of hash
chain heads at the start, and each record is single-threaded (via a
&quot;next&quot; entry) on one of these lists.  My problem is that even though
TDB supports transactions, there were reports of corruption on power
failure (see next post!); we wanted a fast consistency check of the
database.  In particular, this was for ctdb: if the db is corrupt you
just delete it and get a complete copy from the other nodes.
&lt;/p&gt;

&lt;p&gt; A single linear scan would be fastest, rather than seeking around
the file.  Checking each record is easy, but how do we check that it's
in the right hash chain (or the free list), and that each record only
appears once?  The particular corrupt tdb I was given contained such
an infinite loop, which is a nasty failure mode.  The obvious thing to
do is to seek through and record all the next pointers, and the actual
record offsets, then sort the next pointers and see that the two lists
match.  But that involves a sort and would take 8 bytes per record
(TDB is 32 bit, so that's 4 bytes for the next pointer and 4 bytes to
remember the actual record offset).  &lt;/p&gt;

&lt;p&gt; How would we do this in fixed space, even though we don't know how
many records there are?  What if, instead, we allocate two Bloom
filters for each hash chain (and one for the free list)?  We put next
pointers in the first Bloom filter, and actual located records in the
second.  At the end, the two should match!  &lt;/p&gt;

&lt;p&gt; But we can do better than this.  Say we use 8 hashes, and 256 bits
of bitmap.  First off, if the 8 hashes of a value overlap already-set
bits, it has no effect and we won't be able to tell if it's missing
from the other filter.  And if seven bits overlap others (so it only
sets one unique bit) then we can't detect a &quot;bad&quot; value which sets
that same bit and no other unique bits.  &lt;/p&gt;

&lt;p&gt;So instead of setting bits, we can &lt;strong&gt;flip&lt;/strong&gt; bits in
the bitmap.  This means that we can detect a single extra value in one
list unless it happens to cancel out its own bits (ie. the hash values
all happen to form pairs), and if two values are different they'd need
to hit precisely the same bits.  This is astronomically unlikely (it's
a bit more than 1 in 256! / (8! * 248!), but its still a very small
number).
 &lt;/p&gt;

&lt;p&gt;The best bit, of course is that you don't need two bitmaps: a
single one will do.  Since the two sets of values should be equal, it
should be all zero bits when finished!
&lt;/p&gt;

&lt;p&gt;In practice, all the corrupt TDBs I've gathered have had much more
gross errors.  But it's nice to finally use Bloom's ideas!  The code
can be found &lt;a href=&quot;http://ccan.ozlabs.org/browse/ccan/tdb/check.c&quot;&gt;
in the CCAN repository.&lt;/a&gt;
&lt;/p&gt;</description>
</item>
<item>
    <title>Late Night Hacking</title>
    <pubDate>Wed, 30 Sep 2009 03:32:00 GMT</pubDate>
    <link>http://ozlabs.org/~rusty/index.cgi/2009/09/30#2009-09-30</link>
    <description>
&lt;p&gt;
It's been a while, but I find myself hacking to 3am tonight
(virtio_blk needed some love, and it was easier to patch it myself
than explain The Right Thing to the patch submitter).
&lt;/p&gt;

&lt;p&gt;
Am seriously tempted to do that tdb hacking now, but I get Arabella
tomorrow and I definitely want to face her fully refreshed!
&lt;/p&gt;</description>
</item>
<item>
    <title>Lguest?!  Really?</title>
    <pubDate>Mon, 28 Sep 2009 13:10:00 GMT</pubDate>
    <link>http://ozlabs.org/~rusty/index.cgi/2009/09/28#2009-09-28</link>
    <description>
&lt;p&gt;
So, &lt;a href=&quot;http://www.nytimes.com/2009/07/28/science/28comp.html?_r=1&quot;&gt;New
York Times covered Sandia National Labs's using a million virtual machines
to research botnets&lt;/a&gt;.  I saw something fly by on slashdot, but didn't pay
any attention.
&lt;/p&gt;

&lt;p&gt; Then a couple of IBM research guys sent me an embarrassed mail a
few days ago.  Senior IBM execs had seen the &lt;a href=&quot;http://www.sandia.gov/news/resources/releases/2009/linux.html&quot;&gt;Sandia press release
crediting lguest&lt;/a&gt; which &quot;was developed by the research arm of IBM&quot;
and were wondering why they'd never heard of it. :)[1] &lt;/p&gt;

&lt;p&gt; Upshot is: I always said lguest was a hypervisor research and
education tool, but this blew me away!  Ron Minnich has been
submitting stuff for lguest for a while now, but I assumed he was just
idling.  Great stuff!
&lt;/p&gt;

&lt;p&gt;
[1] Linux Technology Center is not part of IBM Research, and lguest was
just a random hack I did to help some other things I was working on (and
probably spent too much time on to justify).</description>
</item>
<item>
    <title>linux.conf.au 2010 Submissions</title>
    <pubDate>Mon, 20 Jul 2009 23:59:00 GMT</pubDate>
    <link>http://ozlabs.org/~rusty/index.cgi/2009/07/20#2009-07-20</link>
    <description>
&lt;p&gt;
Finally submitted to &lt;a href=&quot;http://www.lca2010.org.nz&quot;&gt;LCA 2010&lt;/a&gt;.
Yes, I'm submitting the lguest tutorial for the third (and last) time;
having put all the effort into it, I feel it will finally be a good
tutorial.
&lt;/p&gt;

&lt;p&gt; But I wanted to talk about something else, so I made a more
off-the-wall submission, on the stuff I've been doing with Arabella,
the Nintendo Wiimote, and libcwiid.  The hope is that if that get
accepted it'll give me motivation to spend more time perfecting it!
&lt;/p&gt;

&lt;p&gt; In unrelated news, I got an email from Peter Richards who has been
playing with my old pong code, and made improved IR pens.  He had some
Vishay IR LEDs left over, and has mailed them to me.  If my paper is
accepted, I'll have to figure out what to do with them!
&lt;/p&gt;</description>
</item>
  </channel>
</rss>