[K42-discussion] Questions and comments about lazyReOpen + a patch for lazyReOpen

Orran Y Krieger okrieg at us.ibm.com
Mon Oct 2 01:43:31 EST 2006


First, I would like to apologize Patrick for not responding to this
previously, and thank you for your email that documents how things
work so well.  Comments/clarifications interspersed.  However, a few
general comments up front. First, fork fundamentally violates the
model of K42.  Fork is efficient for an operating system kernel where
one has centralized simple data structures, obviously this is not the
case with K42.  We first did no special kludges with K42 to handle
fork, i.e., all the work to fork was done in the parent process
itself, and that process did aggressive re-opens of files on behalf of
the new child.  Everything was functionally correct, but we had
performance challenges.  In particular:

  o a very important benchmark had a bug where a file was opened at the 
    start, and they just forgot to close it before forking off hundreds of 

    children.  The repeated calls to re-open that same file, that was 
never 
    accessed, was a performance bottleneck for K42.  The lazy reopen stuff 
put 
    us back in the ballpark of Linux, where forking resulted in an 
increment 
    of a reference count.  This is still a bug in the 
application/benchark, 
    but performance is not visible for moderate sized system.  This is, 
    b.t.w., evidence of how lazy fork lets the programmer be :-(
  o we found that sequential sharing was common, e.g., a script has output 

    re-directed into a file, and one program after another writes into the 

    file.  In this case, the file is never shared, i.e., only one process 
at a 
    time is accessing it, but it appears in the natural implementation in 
k42 
    (and linux) to be shared.  We wanted, with the lazy re-open stuff, to 
    recognize this case so that only if both the parent and child are 
actively 
    sharing the file do we put the file into a shared state.  Recall, in 
k42 
    we have multiple optimizations we apply if a file is not actively 
shared. 

To address these problems, we did the lazy re-open logic.  One
important point to note is that in both this logic and in the more
general fork logic, we have not been worried about multi-threaded
programs.  I believe that fork works now for well behaved
multi-threaded apps, i.e., only one thread is active when the fork is
happening... but more general multi-threaded apps, where any thread
may at any time want to fork the whole application, are not supported.
Essentially, we implemented fork and the lazy re-open stuff for legacy
apps, e.g., shell scripts... and our assumption has been that
multi-threaded apps would not require fork.  We have since found some
counter examples, but in all cases the apps had modes where they could
be configured to not fork....

k42-discussion-bounces+okrieg=us.ibm.com at ozlabs.org wrote on
09/14/2006 03:41:02 PM:

> *******************
> Background:
> *******************
> 
> ------------------
> State:
> ------------------
> 
> The following is some background on K42 client and server objects for 
those
> on the list that are not familiar with it.  I'm including this both to 
help
> other understand some of this and to make sure that I do myself (and 
> hopefully
> someone will correct me if I get it wrong.)
> 
> When a client opens a file object (files, directories, streams, etc) 
state
> is maintained in several places.  The server side object (ServerFile,
> DirLinuxFS, etc) maintains state for managing the server object itself, 
e.g.
> ServerFiles know what DirLinuxFS objects point to them, DirLinuxFS 
objects
> have a list of currently instantiated children, etc.
> 
> In addition, the server object can optionally associate client specific
> data with an open request.  As part of an open, the server object calls
> giveAccessByServer to tell the COBJ layer that it wants to allow a new
> client to access the server object.  giveAccessByServer will in turn 
call
> giveAccessSetClientData, which the server object has the option to 
override.
> The server can then associate an 8byte token (an XHandle) with
> the ObjectHandle returned to the client.  The COBJ layer passes the
> ObjectHandle (and related XHandle) to the ServerObject during all future
> PPC calls made by clients against that ObjectHandle.  Typically, the
> ServerObject allocates a client specific data structure and stores a 
pointer
> to it in the xhandle.  (I assume, but haven't verified, that there are
> mechanisms to keep clients from diddling with the xhandle values.)

Yes, outside of a few magic xhandles that allow anyone to access them 
(with limited permissions, e.g. mount points for lookups), xhandles can 
only be accessed by a single client.  We first check that the xhandle is 
in a particular target range (i.e., in the range of the array of 
xhandles), then we check in the array the client id of who is allowed to 
access (with the caller id provided by the kernel on ppc), and then we 
automatically do some extra checks in the xobject that is automatically 
generated by the stub compiler... This capability scheme has been a really 

nice part of k42... automating much of the standard error checking. 
 
> Finally, state is stored in the client object, including the stub object
> (and ObjectHandle) used to contact the corresponding server object.  The
> client object for regular files, FileLinuxFile, stores its open flags in
> addition to the stub to the ServerFile.
> 
> ------------------
> File Descriptors:
> ------------------
> 
> Every user level process object contains a static FD class that manages 
the
> state of file descriptors.  It maps fds to their associated client side 
file
> objects and manages several bitmaps representing fd state.  It has a 
> bitmap for
> free files, files that need to be re-opened post fork (see below) and 
files
> that need to be closed on exec.

Yup, note... a program written against the native K42 operations wouldn't 
use FD at all, and, of course, would not be able to fork. 
 
> *******************
> Pre-Fork
> *******************
> 
> --------------------------------------------
> Pre-fork Client Side
> --------------------------------------------
> 
> When a process forks, new copies of the client state maintained in the
> server need to be created. K42 does this lazily to avoid the case
> where the new client (i.e. the child) doesn't access the ObjectHandles
> granted to its parent.
> 
> Prior to forking, FD::ForkAll is called to prepare all open files for 
post
> fork lazyReOpening.  FD::ForkAll calls clientFileRef->lazyGiveAccess for 

> every
> open file, disassociates the fd with the current client file reference 
and
> marks the fd as being ready for lazyReOpening.  This FD state is 
propegated
> across the fork since it resides within the process itself.  However, 
the
> state associated with the ObjectHandle and client file reference works
> differently.
> 
> The pattern for propagating the ObjectHandle and client file state seems 
to
> be consistent across all the file objects and works as follows.  The 
> associated
> client and server share a common lazyState transfer data structure 
> defined by
> the client.  When the FD object calls clientFileRef->lazyGiveAccess, the 

> client
> fills in part of the structure and then calls
> serverFileRef->_lazyGiveAccess(fd, partiallyFilledXferData...).
> 
> --------------------------------------------
> Pre-fork Server Side
> --------------------------------------------
> 
> What the server object does is somewhat interesting.  During 
> lazyGiveAccess it
> creates a new ObjectHandle on behalf of the **kernel** via 
> giveAccessByServer.

Yeah, for convenience I shoved this in the kernel, but that was something 
that wasn't intended to last :-)

> The server allocates a new client state structure, populates it, and
> stores a pointer to it in the ObjectHandle created for the kernel. 
> Server side
> client state is then added to the partially filled transfer data buffer 
> passed
> in by the client file object.
> 
> The server then calls
> process->_lazyGiveAcecss(fd, xferData, kernelObjectHandleForTheFile).
> 
> The kernel process server then tucks away the xferData and the 
> ObjectHandle in
> a data structure accessed by the fd, i.e. in an instance of the 
> LazyState class.
> 
> --------------------------------------------
> Post-fork Client and Server
> --------------------------------------------
> 
> When the child (or the parent for that matter) accesses any of the file
> descriptors opened prior to the fork, FD detects the readyForLazy bit 
and
> makes a call to its process to reopen the file.  The resulting call 
graph
> looks as follows.  (Note, the call graph is highly abbreviated and 
doesn't
> contain the cobj layer or many of the intermediate client/stub calls)
> 
>  _FD::locked_lazyGetFD(fd)
>     char data[512];
>     process->_lazyReOpen(__in fd, __out &oh, __out &type,
>                          __outbuf data, __out &DataLen)
>         // This call is routed to the kernel process object e.g.
>         // ProcessReplicated or ProcessShared
> 
>         LazyState.lazyReOpen(pid, fd, &type, &oh, char *data, dataLen);
> 
>             // Lookup stub and marshaled data by fd
> 
>             // copy the data passed prefork via lazyGiveAccess into the
>             // data buf.  XXX: there is a bug here, see the 
> questions/comments
>             memcpy(data, preForkData, preForkDataLen);
> 
>             // Call the server object's lazy reopen
>             stub._lazyReOpen(oh, pid, match, nomatch, data, dataLen);
> 
>                  // The stub is to the server object being reopened,
>                  // e.g. a ServerFile or a DirLinuxFS.
> 
>                  // server file based example...
>                  ServerFile::_lazyReOpen(__out &oh,
>                                          __in toProcID,
>                                          __in AccessRights match,
>                                          __in AccessRights nomatch,
> 
__inoutbuf(datalen:datalen:datalen)
>                                              char *data,
>                                          __inout datalen,
>                                          __XHANDLE xhandle)
> 
>                      // The data passed to this reopen call is the data 
that
>                      // was stored in LazyState.  The server object is
>                      // supposed to create and return a new oh for 
forked
>                      // client.  Note however that the server state was
>                      // passed in two different forms, i.e. it was 
passed
>                      // in the marshaled data buffer, but  it is also
>                      // contained in the Xhandle for the stub's oh.
> 
>                      // Create a new oh for the forked client
>                      giveAccessByServer(oh, toProcID, match, nomatch);
>                           // cobj layer ends up calling....
> 
>                           // (I don't really need to show this part
>                           // for the purpose of this discussion, but it 
may
>                           // be useful for someone to help understand
>                           // the entire process.
>                               ServerFile::giveAccessSetClientData(&oh,
> 
toProcID,
>                                                                   match,
> 
nomatch,
>                                                                   type)
> 
>                                   // Allocate client data state
>                                   ClientData *clientData = new 
ClientData();
> 
>                                   // Tell the cobj layer to grant access
>                                   // to the forked client, i.e. create
>                                   // an ObjectHandle and associate the
>                                   // newly allocated clientData 
structure
>                                   // with its XHandle.
>                                   giveAccessInternal(oh, toProcID,
>                                                      match, nomatch, 
type,
>                                                      (uval)clientData);
> 
>                      // (back in SeverFile::_lazyReopen)
>                      // We now need to set the client state on the oh 
> returned
>                      // from giveAccessByServer
>                      ClientData *clnt = 
> (XHandleTrans::GetClientData(oh.xhandle)
> 
>                      clnt->state = un-marshaled state from *data
> 
> 
>     // Unwind all the way back to FD, returning the new oh to the forked
>     // client and the data buf so that the forked client can recreate
>     // the client side state
> 
>     // Recreate client object and state
>     switch (type) {
>         ...
>     // case statement per object type, e.g.
>     case FileLinux_FILE:
>         rc = FileLinuxFile::LazyReOpen(fileRef, oh, buf, dataLen);
>         break;
>     }
> 
>     // store fileref in fd table and clear its close on exec bit
> 
>     return fileRef;
> 
> 
> ************************
> Questions and comments
> ************************
> 
> 1) Why pass around and store the marshaled data buffer for the lazy 
state?
> In lazyGiveAccess(), the server object creates an object handle to store 
in
> the kernel on behalf of the soon-to-be forked client.  The server then
> associates the client state with this oh in its un-marshaled form.
> Passing the data in a marshaled format on top of this seems redundant.

Hmmm, not positive I undrestand the question, but... I don't think
that the server is required to keep track of any *new* information for
forked file.  So, the lazy re-open logic in the kernel (which could
have been a new server for this purpose) keeps track of the new state,
and the server treats that as a different client.  So, for example, if
we mapped a file that was not shared, the file offset (I think) is
kept only in the opaque lazy state.  Again, I can review this logic,
this is just my memory.  That is, we didn't want to polute every
server for a special kludge for fork, we just wanted to add a new
service, that kept state that would otherwise be kept in the client.

> 2) Is there a race in the placement of the FD::ForkAll call? FD::ForkAll 
is
> called inside kitch-linux/lib/emu/fork.C prior to calling 
> ProgExec::ForkProcess.
> However, what happens if there is a thread switch between the ForkAll 
and
> the spot inside ForkProcess where the process's local scheduler is 
disabled?
> If another thread in the process performs file IO, it seems that the 
> lazy bits
> inside FD will be lost.  It seems that the same thing could occur even 
while
> ForkAll is running.  Should ForkAll be moved between Scheduler::Disable
> and Scheduler::DisabledScheduleFunction(ForkWorker...)?
> (Is this even possible?  ForkAll is going to be making calls
> like DREF(fileRef)->lazyGiveAccess(..).  Would these calls be 
serviceable
> while the local scheduler is disabled?)

As I stated earlier, we don't support fork of multi-threaded app. 
 
> 3) Is the placement of FD::ForkAll really what was intended?  The state 
for
> FD's is lazily reopened in the child, but it also causes the parent to 
go
> through the lazy reopen process as well.  Was this the intent?  Should 
the
> call to FD::ForkAll be even farther along in the process than I was 
> asking in
> question 2?  i.e. should it be after the ForAddressSpace call in the 
> ForkWorker?

Yes, this was crucial, we wanted to be able to optimize sequentially 
shared files, where only one process at a time accesses the file.  Real 
active sharing is increadibly uncommon, and the case of a shell starting 
one process after another accessing a file is an important thing to 
optimize. 
 
> 4) lazyReOpen can lead to memory corruption if the marshaled data passed
> to LazyState during lazyGiveAccess exceeds 512 bytes.  ProcessServer and
> the base kernel Process object both declare their _lazyReOpen calls to 
have
> a databuf of __outbuf(dataLen:512) and they don't pass the size of this
> buffer to LazyState.  LazyState then has no way to know if it is going 
to
> overflow the buffer or not.  I experienced this in one of my tests when
> I forked while holding a directory open.  (I have attached a patch for 
> this.)

Ooops thank you for fixing!  Thank you Dilma for committing the fix. 
 
> 5) Where is the marshaled client data deallocated?
> It looks like it should be deallocated in the destructor for
> StubFileLinuxServerHolder, but it isn't.  Is the deallocation via a 
subtle
> interaction that I don't see?

I need to look at this. We may have a leak here, sigh... the lazy
re-open logic was a quick test that I intended to re-write some day. 
 
> 6) The reference counting of the lazy state objects has a race.  When
> decRefCount returns 0, the caller is supposed to delete it.  Here is the 

> code:
>     uval LazyState::StubFileLinuxServerHolder::decRefCount() {
>         lock.acquire();
>         refcount--;
>         lock.release();
>         return refcount;
>     }
> In practice, I don't think it will be triggered because a higher level 
lock
> is being held on the LazyState object itself, but say the refcount = 2 
and
> two different threads enter this routine at the same time (which I don't 

> think
> is possible just based on the semantics of the objects), then they can
> be scheduled such that they will both return 0 and a double delete will 
be
> performed.  So, if there is indeed a real reason to be performing this 
lock,
> the state of refcount needs to be saved in a local variable prior to
> releasing the lock.  If the lock isn't needed, it should probably just
> be removed.
> 

Again, we are not handling multi-threaded apps for fork, for is
considered a legacy support issue for single-threaded apps. 
 
> 
> 
> 
> Only in kitchsrc: .git
> diff -ru kitchsrc.orig/kitch-linux/lib/emu/FD.C kitchsrc/kitch-
> linux/lib/emu/FD.C
> --- kitchsrc.orig/kitch-linux/lib/emu/FD.C   2006-01-31 15:34:00.
> 000000000 -0800
> +++ kitchsrc/kitch-linux/lib/emu/FD.C   2006-09-14 11:16:56.000000000 
-0700
> @@ -156,8 +156,8 @@
>      uval type;
>      ObjectHandle oh;
>      FileLinuxRef fileRef;
> -    char buf[512];
> -    uval dataLen;
> +    char buf[2048];
> +    uval dataLen = sizeof(buf);
> 
>      // try again
>      if (fdrec[fd].ref) return fdrec[fd].ref;
> diff -ru kitchsrc.orig/os/kernel/bilge/LazyState.C 
> kitchsrc/os/kernel/bilge/LazyState.C
> --- kitchsrc.orig/os/kernel/bilge/LazyState.C   2003-11-09 13:57:30.
> 000000000 -0800
> +++ kitchsrc/os/kernel/bilge/LazyState.C   2006-09-14 11:14:05.000000000 

-0700
> @@ -64,6 +64,15 @@
>     SysStatus rc;
>     AutoLock<LockType> al(&lock); // locks now, unlocks on return
>     tp = type;
> +
> +   if (dl < dataLen) {
> +       tassertWrn(0, "lazyReOpen buffer too small %ld < %ld\n",
> +             dl, dataLen);
> +       // FIXME: a proper code needs to be allocated for this, but
> +       // only someone from IBM can do that.
> +       return _SERROR(9999, 0, ENOMEM);
> +   }
> +
>     memcpy(d, data, dataLen);
>     dl = dataLen;
>     if (type == FileLinux_FILE || type == FileLinux_DIR) {
> diff -ru kitchsrc.orig/os/kernel/proc/Process.C 
> kitchsrc/os/kernel/proc/Process.C
> --- kitchsrc.orig/os/kernel/proc/Process.C   2005-12-30 15:04:30.
> 000000000 -0800
> +++ kitchsrc/os/kernel/proc/Process.C   2006-09-14 11:17:23.000000000 
-0700
> @@ -202,7 +202,8 @@
>  /* virtual */ SysStatus
>  Process::_lazyReOpen(__CALLER_PID caller, __in sval file, __out uval 
&type,
>             __out ObjectHandle &oh,
> -           __outbuf(dataLen:512) char *data, __out uval &dataLen)
> +           __outbuf(dataLen:dataLen) char *data,
> +           __inout uval &dataLen)
>  {
>      tassertMsg((caller==(uval)getPID()), "reopen by other 
process??\n");
>      return lazyReOpen(file, type, oh, data, dataLen);
> diff -ru kitchsrc.orig/os/kernel/proc/Process.H 
> kitchsrc/os/kernel/proc/Process.H
> --- kitchsrc.orig/os/kernel/proc/Process.H   2004-08-16 15:00:29.
> 000000000 -0700
> +++ kitchsrc/os/kernel/proc/Process.H   2006-09-13 14:16:45.000000000 
-0700
> @@ -95,8 +95,8 @@
>      virtual SysStatus lazyCopyState(LazyState *ls)=0; // copy from
>      virtual SysStatus _lazyReOpen(__CALLER_PID caller, __in sval file,
>                __out uval &type, __out ObjectHandle &oh,
> -              __outbuf(dataLen:512) char *data,
> -              __out uval &dataLen);
> +              __outbuf(dataLen:dataLen) char *data,
> +              __inout uval &dataLen);
>      virtual SysStatus _lazyClose(sval file) { return lazyClose(file); }
>      virtual SysStatus _lazyGiveAccess(__in sval file, __in uval type,
>                    __in ObjectHandle oh,
> diff -ru kitchsrc.orig/os/kernel/proc/ProcessServer.H 
> kitchsrc/os/kernel/proc/ProcessServer.H
> --- kitchsrc.orig/os/kernel/proc/ProcessServer.H   2006-02-07 23:14:
> 07.000000000 -0800
> +++ kitchsrc/os/kernel/proc/ProcessServer.H   2006-09-14 11:23:13.
> 000000000 -0700
> @@ -160,8 +160,8 @@
>      // all the lazy stuff
>      virtual SysStatus _lazyReOpen(__CALLER_PID caller, __in sval file, 
>                __out uval &type, __out ObjectHandle &oh, 
> -              __outbuf(dataLen:512) char *data,
> -              __out uval &dataLen)=0;
> +              __outbuf(dataLen:dataLen) char *data,
> +              __inout uval &dataLen)=0;
>      virtual SysStatus _lazyClose(sval file)=0;
>      virtual SysStatus _lazyGiveAccess(__in sval file, __in uval type, 
>                    __in ObjectHandle oh, 
> _______________________________________________
> K42-discussion mailing list
> K42-discussion at ozlabs.org
> https://ozlabs.org/mailman/listinfo/k42-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://ozlabs.org/pipermail/k42-discussion/attachments/20061001/4d9c94b6/attachment.htm 


More information about the K42-discussion mailing list