[K42-discussion] race condition in clustered object call on unoptimized build
Dilma DaSilva
dilma at watson.ibm.com
Fri Feb 10 14:43:35 EST 2006
This sounds like a nasty race condition.
Could we mark the entry on the LTT is invalid when changing it from a
real object to the default one, and only reuse entries after the
generation count indicates that no "old thread" could be in the midst
of checking?
I'm probably missing something ... I wish I could jump into this now
:-)
Raymond Fingas writes:
> I finally managed to track down a bug that was affecting Arbiters in
> multiprocessor systems on fullDeb builds.
>
> It seems that in an unoptimized build (i.e. fullDeb), reading the function
> pointer and reading the this pointer involve multiple reads from memory.
> If the representative pointer in the local translation table is changed
> between the first read (for the function pointer) and the second read (the
> this pointer), then the function call can fail dramatically (in
> particular, a function for an arbitrary clustered object can be called
> with a this pointer for a default object). The LTT could be reset to the
> default object when an Arbiter is interposed or removed, or if the program
> was suspended and the translation table paged out.
>
> For the case of the Arbiter, I put a check that will call the default
> object if the pointers are not consistent when running a debug build.
> This isn't a total solution, since it is possible to mix an unoptimized
> application binary with an optimized libc binary, but at least it works
> for the test cases. I'm not sure how this should be handled in the more
> general case, though, where any other object could be called.
>
> Raymond
> _______________________________________________
> K42-discussion mailing list
> K42-discussion at ozlabs.org
> https://ozlabs.org/mailman/listinfo/k42-discussion
More information about the K42-discussion
mailing list