[K42-discussion] The risky business of existence locks

Dilma DaSilva dilma at watson.ibm.com
Wed Nov 9 13:33:35 EST 2005


The K42 team recently encountered an "existence lock bug" in a very
important code path. We think it's worth it to do a quick review of
the issue here. My description is going to be quick and dirty; I'm assuming
that some description is better than none :-)

Background on existence locks: in an enviroment with concurrent
execution, there is the issue of guaranteeing that as a thread is
invoking a method on an object X, another thread is not destroying X.
Many systems will associate an "existence lock" to the object,
and rely on the programming discipline of acquiring the existence
lock before retrieving/using an object reference.

K42's Clustered Objects model offers some guarantees about when an
object submitted to destruction is actually deleted. With a
technique similar to Read-Copy Update (which is used in Linux), 
K42 guarantees that the actual object destruction will not occurr
before all active threads possibly holding a reference to the object
terminate. Until this safe point is reached, manipulations of
the clustered object (e.g. DREF(obj)->foo()) are valid and will
return a sane error indicating that the object has already been
submitted for destruction.

The model assumes that most threads on the system are short-lived,
so the garbage collection will keep progressing. Whenever we need
to block a thread (so it's not short-lived anymore), we mark
it as "deactivated", to indicate that the thread shouldn't not be
"counted" in the scheme (generation count) for detecting a safe point for object
destruction.

So if two threads are running, one using the object and the
other destroying it, we can count on the object model to handle
the existence issue:

       Thread 1                           Thread 2
       ==================================================
       ref = findObjInTable();    |          
                                  |   ref = findObjInTable();
                                  |   removeObjFromTable(ref);
                                  |   returnCode = DREF(ref)->destroy();
       returnCode = DREF(ref)->f()|
       //returnCode will indicate
       //obj has been destroyed

Take a look at kitch-linux/lib/emu/read.C and you will see a broken
way of relying on the infrastructure :-) 

(line numbers refer to version 1.31):
- in line 42, given a file descriptor fd, we get the corresponding object 
reference fileRef. (FileLinuxRef is a reference to a clustered object)
- in line 58, we invoke method read on object fileRef. This is being
done inside a loop. 
- the read method may return information specifying that the read has
to block. In line 80 SYSCALL_BLOCK() is invoked, so the system call is
blocked until data availability wakes it up later.

SYSCALL_BLOCK() deactivates the thread, so the underlying
infra-structure is not guarateening the existence of the object
referred by fileRef. It's possible that when the thread wakes up from
SYSCALL_BLOCK, the fileRef is totally bogus, or the reference has
already been recycled and now it points to a totally new object.
You were reading from one file/socket/pipe and now the read continues
from another one :-)

The message: be aware of the object existence guarantees as you code.
Remember that refs you hold on your stack are "safe" as long as the
thread doesn't deactivate itself. Remember that if you forgot the
reference on a data structure, i.e., you didn't update  your data
structure to reflect object destruction, following the ref may
blow on your face, or worse, it may manipulate an object you didn't
intend to.

If you need more information, speak up on the list :-) I simplified
and distorted the story, but specific questions will probably
be answered by the people who wrote the code.

About the fix to read(), it's on the way. It seems that the solution
will be to use one of the arguments returned by read (one that works a
form of continuation) to detect that the object is under destruction.

Dilma



More information about the K42-discussion mailing list