Free Software programmer
This blog existed before my current employment, and obviously reflects my own opinions and not theirs.
This work is licensed under a Creative Commons Attribution 2.1 Australia License.
Sun, 30 Apr 2006
Found a problem with ccontrol, while building git: it forks 120 asciidoc instances, which then drive my machine into the ground.
This lead to experimenting with the ccontrol locking, and reinstating an older version of the code which used to timeout locks, and then retry another lock. This is useful to avoid getting stuck behind very slow compiles, or even a suspended compile. Of course, the lock is there to avoid us running hundreds of compiles at once: we don't want to ignore it if the machine is actually at capacity. Previously I've discovered that getloadavg is completely useless for this kind of self-limiting.
So, I did some tests: by doing gettimeofday() before and after the sleep, we can quite accurately detect the interactive response of the machine: if it's not good, we're stressed and shouldn't assume that whoever is holding the lock is sleeping.
Which got me to thinking: why don't we do this all the time? If we always started with one lock, we could then decide to completely ignore the lock if we timeout and the machine isn't stressed: this would self-adjust to an optimal (high but not thrashing) load. This would remove a configuration option from ccontrol: you currently tell it how many CPUs you have. This method would let us automatically expand to fill any machine!
Well, it works for a while. At some stage a bunch of ccontrols decide to go lockless, and we jump to doing 5 compiles at once. At this point, we become I/O bound, so the CPU is fairly idle: we're not thrashing just yet, and our interactivity is good. Another 10 ccontrols think we're not stressed and go lockless, and suddenly we're thrashing to death...
After several painful experiments, tried only going lockless if the load average is low, which reflects the problem of I/O activity. This works better, but is a delicate balance: I think I'll stick to backing off onto a new lock, not abandoning locking altogether when ccontrol thinks we're not stressed...
[/tech] permanent link