Tue, 14 Dec 2004

I've taken a couple of days this week to work on nfsim again. Went onto #netfilter to find testers, which gave me enough to keep going to a couple of days fixing what were basically packaging and usability issues: you needed to set LD_LIBRARY_PATH, you needed to load modules manually for iptables to work, you needed to have iptables in /sbin.

But my latest improvements are absolutely wonderful. Firstly, I used the power of talloc to detect and report memory leaks in the code. All the allocations done by "kernel" code (skb_alloc, kmem_cache_alloc, kmalloc and vmalloc) go through talloc, attached to different contexts. At the end, I call all the modules' exit routines, then check talloc_total_blocks() == 1 for each one. Not only detects leaks, but says where they came from.

But to really get gcov coverage up, you need to test allocation failures and the like. This is actually hard: firstly our test scripts assume that allocations always succeed. Otherwise they'd be a nightmare. Secondly, the allocation points move around in the code, so there's no good way to specify a particular allocation to fail.

To solve the first, I split failures into "script_fail()" which meant the scripted test failed (failed command, failed expectation) vs. "barf()" which meant some internal problem. And when testing allocation failures, I would ignore any call to "script_fail()".

For the second problem, I came up with the idea of using the snapshotting Jeremy and I had been discussing, to implement an automatic exhaustive failure test: first pass the allocation, then roll back and try failing it. But it implementing snapshotting and rollback while keeping all the state of allocation failures and successes already done looked like a nightmare.

Then Jeremy said "why not just fork() the simulator". And voila! We have exhaustive failure path testing.


[/tech] permanent link