[K42-discussion] race condition in clustered object call on unoptimized build
Raymond Fingas
fingas at eecg.toronto.edu
Fri Feb 10 10:04:55 EST 2006
I finally managed to track down a bug that was affecting Arbiters in
multiprocessor systems on fullDeb builds.
It seems that in an unoptimized build (i.e. fullDeb), reading the function
pointer and reading the this pointer involve multiple reads from memory.
If the representative pointer in the local translation table is changed
between the first read (for the function pointer) and the second read (the
this pointer), then the function call can fail dramatically (in
particular, a function for an arbitrary clustered object can be called
with a this pointer for a default object). The LTT could be reset to the
default object when an Arbiter is interposed or removed, or if the program
was suspended and the translation table paged out.
For the case of the Arbiter, I put a check that will call the default
object if the pointers are not consistent when running a debug build.
This isn't a total solution, since it is possible to mix an unoptimized
application binary with an optimized libc binary, but at least it works
for the test cases. I'm not sure how this should be handled in the more
general case, though, where any other object could be called.
Raymond
More information about the K42-discussion
mailing list