[K42-discussion] Debugging help

Dilma DaSilva dilma at watson.ibm.com
Thu Nov 10 07:15:24 EST 2005


I think I can help you a bit with this, David.

I guess you are NOT running with KFS, are you? If not, than the SWAP
system is going to NFS. The assertion you're hitting is related to
method convertAddressWriteTo. You're right that
FRPA::convertAddressWriteTo returns 0, but if you're using NFS the FR
you have is for file systesms that work on virtual addresses (because
NFS talks through RPC, so startFillpage or startWrite have to send a
virtual address). The implementation of FRVA::convertAddresWriteTo
will assert if you're trying to write more than 64 pages, because
RegionFSComm was created with limit of 64 pages. Livio has ran towards 
this recently.

I did some initial effort on getting the transport between memory
manager and FS to be limited (for NFS or any other FRVA based file
system) so that we don't hit that value. I've asked Livio/Jonathan to
give me their test where they hit that problem, but I believe they
forgot.

Now about running with KFS: I've never ran with KFS on k0. I'll look
into that, but before I'll pursue making sure that we don't generate
more than 64 concurrent NFS accesses ... Do you believe your coding is 
paging? Why is it paging now with 8 proc and not with 4 proc, I mean,
how are you scaling the workload?

I'm writing this during a meeting, sorry if it's not understable

dilma


David Tam writes:
 > I need some debugging help and I'm wondering if anyone has any clue as to
 > what my bug might be.
 > 
 > I've been attempting to run SPECjbb2000 + J9 JVM on K42 with my user-level
 > thread migration patch enabled.  My kitchsrc was last updated on Nov 2nd.
 > 
 > Everything runs fine on a 4-CPU system (k10) but I encounter a gdb
 > breakpoint in the kernel when running on a 8-CPU system (k0 with only 8
 > CPUs enabled).
 > 
 > The k42 console reports the following message ~145 times and then
 > hits a gdb breakpoint in the kernel.
 > 
 > 	Giving back 0x10 pages (Y > 0x80000)
 > 
 > , where Y is between 0x80002 and 0x0005 inclusively.
 > 
 > gdb tells me that I triggered the assert in FRPA::startPutPage()
 > because rc=0x800000000b2c0110.
 > 
 > FRPA::startPutPage() {
 > ...
 > ..
 > .
 >     // FIXME, pass in blocking info here...
 >     rc = convertAddressWriteTo(physAddr, addr, rr);
 >     tassertMsg(_SUCCESS(rc), "rc 0x%lx\n", rc);
 > ...
 > ..
 > .
 > }
 > 
 > Upon further investigation of the source code of convertAddressWriteTo(),
 > I find that it always returns 0.
 > Therefore, it should be impossible for the tassertMsg() to be triggered.
 > 
 > virtual SysStatus convertAddressWriteTo(uval physAddr, uval &vaddr,
 >                                         IORestartRequests *rr=0) {
 >     vaddr = physAddr;
 >     return 0;
 > }
 > 
 > 
 > 
 > Perhaps there is memory corruption caused by my changes to the
 > user-level scheduler code (kitchsrc/lib/libc/scheduler/*) ?
 > 
 > Any guesses, hints, suggestions are gladly welcomed.
 > Thanks.
 > 
 > 
 > 
 > =========
 > 
 > Here is some more information about that frame.
 > 
 > (gdb) info frame
 > Stack level 3, frame at 0xc000000002ef4860:
 >  pc = 0xc00000000221eafc
 >     in FRPA::startPutPage(unsigned long, unsigned long, IORestartRequests*)
 >     (/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C:136); 
 >     saved pc 0xc0000000022c3b58
 >  called by frame at 0xc000000002ef4910, caller of frame at 0xc000000002ef47d0
 >  source language c++.
 >  Arglist at 0xc000000002ef4860, args: this=0x8002000020623600, 
 >     physAddr=877690880, objOffset=5795840, rr=0x80020000208c6d80
 >  Locals at 0xc000000002ef4860, Previous frame's sp in r1
 >  Saved registers:
 >   r30 at 0xc000000002ef4900, r31 at 0xc000000002ef4908,
 >   lr at 0xc000000002ef4920
 > (gdb) 
 > 
 > 
 > Local variables:
 > (gdb) info local
 > size = 4096
 > addr = 11460608
 > rc = -9223372036667342576
 > (gdb) 
 > 
 > Doing a gdb "backtrace" reports the following.
 > (gdb) bt
 > #0  breakpoint () at libksup.C:49
 > #1  0xc0000000023b0ae4 in raiseError() ()
 >     at /homes/kix/tamdavid/k42-20050520/kitchsrc/lib/libc/sys/TAssert.C:50
 > #2  0xc0000000023b0c48 in errorWithMsg(char const*, char const*, unsigned long, char const*, ...) (
 >     failedexpr=0xc000000002474b98 "(__builtin_expect(((rc)>=0),1))", 
 >     fname=0xc000000002474bb8 "/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C", lineno=136, fmt=0xc000000002474d10 "rc 0x%lx\n")
 >     at /homes/kix/tamdavid/k42-20050520/kitchsrc/lib/libc/sys/TAssert.C:108
 > #3  0xc00000000221eafc in FRPA::startPutPage(unsigned long, unsigned long, IORestartRequests*) (this=0x8002000020623600, physAddr=877690880, 
 >     objOffset=5795840, rr=0x80020000208c6d80)
 >     at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C:136
 > #4  0xc0000000022c3b58 in FSFRSwap::startPutPage(unsigned long, FRComputation**, unsigned long, unsigned long&, unsigned long volatile*, IORestartRequests*) (
 >     this=0xc00000000549b300, physAddr=877690880, ref=0x8000000010008f20, 
 >     offset=11460608, blockID=@0xc000000002ef4a30, context=0x8002000000328070, 
 >     rr=0x80020000208c6d80)
 >     at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/bilge/FSFRSwap.C:227
 > #5  0xc00000000222414c in FRComputation::putPageInternal(unsigned long, unsigned long, unsigned long, IORestartRequests*) (this=0x8002000000328000, 
 >     physAddr=877690880, offset=11460608, async=1, rr=0x80020000208c6d80)
 >     at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRComputation.C:200
 > #6  0xc00000000222431c in FRComputation::startPutPage(unsigned long, unsigned long, IORestartRequests*) (this=0x8002000000328000, physAddr=877690880, 
 >     offset=11460608, rr=0x80020000208c6d80)
 >     at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRComputation.C:242
 > #7  0xc0000000021b0788 in FCMDefault::resumeIO() (this=0x8002000020811a00)
 >     at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FCMDefault.C:808
 > #8  0xc0000000021b2424 in IORestartRequests::notify() (this=0x80020000208c6d80)
 >     at IORestartRequests.H:103
 > #9  0xc000000002230ee8 in IORestartRequests::NotifyAll(IORestartRequests*) (
 >     qcopy=0x0) at IORestartRequests.H:119
 > #10 0xc000000002240874 in KernelPagingTransport::ioComplete() (
 >     this=0x800200000030a400)
 >     at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/KernelPagingTransport.C:171
 > #11 0xc00000000221cd48 in FRVA::_ioComplete(unsigned long, unsigned long, long)
 >     (this=0x8002000020623600, vaddr=1100586164224, fileOffset=5541888, rc=0)
 >     at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRVA.C:114
 > #12 0xc000000002250148 in XFRVA::__ioCompleteEmm(unsigned long) (
 >     this=0x8002000000406f00, callerID=4294967301) at XFRVA.C:130
 > #13 0xc00000000239a018 in DispatcherDefault_InvokeXObjMethod ()
 >     at CObjRootMediator.H:102
 > #14 0xc000000002399eb0 in DispatcherDefault_PPCServerOnThread ()
 >     at CObjRootMediator.H:102
 > (gdb) 
 > 
 > =============================
 > 
 > k42console output
 > -----------------
 > 	Giving back 0x10 pages (0x80001 > 0x80000)
 > 	Giving back 0x10 pages (0x80002 > 0x80000)
 > 	Giving back 0x10 pages (0x80003 > 0x80000)
 > 	Giving back 0x10 pages (0x80004 > 0x80000)
 > 	Giving back 0x10 pages (0x80004 > 0x80000)
 > ...
 > ..
 > .
 > (~145 times)
 > 	Giving back 0x10 pages (0x80001 > 0x80000)
 > 	Giving back 0x10 pages (0x80001 > 0x80000)
 > 	Giving back 0x10 pages (0x80001 > 0x80000)
 > 	Giving back 0x10 pages (0x80001 > 0x80000)
 > ERROR: file "/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C", line 136
 > rc 0x800000000b2c0110
 > GDB got trap: Program Interrupt
 > vector=0x700, sr=0xa00000000002b032, pc=0xc0000000022afb34 lr=0xc0000000023b0ae4
 > Kernel Connecting to GDB via thinwire channel
 > (use kvictim to find gdb target machine and port)
 > 
 > 
 > -- 
 > David Tam <tamda at eecg.toronto.edu>
 > Graduate Student, ECE Dept, University of Toronto
 > http://www.eecg.toronto.edu/~tamda
 > 
 > _______________________________________________
 > K42-discussion mailing list
 > K42-discussion at ozlabs.org
 > https://ozlabs.org/mailman/listinfo/k42-discussion



More information about the K42-discussion mailing list