[K42-discussion] Debugging help
David Tam
tamda at eecg.toronto.edu
Thu Nov 10 05:40:24 EST 2005
I need some debugging help and I'm wondering if anyone has any clue as to
what my bug might be.
I've been attempting to run SPECjbb2000 + J9 JVM on K42 with my user-level
thread migration patch enabled. My kitchsrc was last updated on Nov 2nd.
Everything runs fine on a 4-CPU system (k10) but I encounter a gdb
breakpoint in the kernel when running on a 8-CPU system (k0 with only 8
CPUs enabled).
The k42 console reports the following message ~145 times and then
hits a gdb breakpoint in the kernel.
Giving back 0x10 pages (Y > 0x80000)
, where Y is between 0x80002 and 0x0005 inclusively.
gdb tells me that I triggered the assert in FRPA::startPutPage()
because rc=0x800000000b2c0110.
FRPA::startPutPage() {
...
..
.
// FIXME, pass in blocking info here...
rc = convertAddressWriteTo(physAddr, addr, rr);
tassertMsg(_SUCCESS(rc), "rc 0x%lx\n", rc);
...
..
.
}
Upon further investigation of the source code of convertAddressWriteTo(),
I find that it always returns 0.
Therefore, it should be impossible for the tassertMsg() to be triggered.
virtual SysStatus convertAddressWriteTo(uval physAddr, uval &vaddr,
IORestartRequests *rr=0) {
vaddr = physAddr;
return 0;
}
Perhaps there is memory corruption caused by my changes to the
user-level scheduler code (kitchsrc/lib/libc/scheduler/*) ?
Any guesses, hints, suggestions are gladly welcomed.
Thanks.
=========
Here is some more information about that frame.
(gdb) info frame
Stack level 3, frame at 0xc000000002ef4860:
pc = 0xc00000000221eafc
in FRPA::startPutPage(unsigned long, unsigned long, IORestartRequests*)
(/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C:136);
saved pc 0xc0000000022c3b58
called by frame at 0xc000000002ef4910, caller of frame at 0xc000000002ef47d0
source language c++.
Arglist at 0xc000000002ef4860, args: this=0x8002000020623600,
physAddr=877690880, objOffset=5795840, rr=0x80020000208c6d80
Locals at 0xc000000002ef4860, Previous frame's sp in r1
Saved registers:
r30 at 0xc000000002ef4900, r31 at 0xc000000002ef4908,
lr at 0xc000000002ef4920
(gdb)
Local variables:
(gdb) info local
size = 4096
addr = 11460608
rc = -9223372036667342576
(gdb)
Doing a gdb "backtrace" reports the following.
(gdb) bt
#0 breakpoint () at libksup.C:49
#1 0xc0000000023b0ae4 in raiseError() ()
at /homes/kix/tamdavid/k42-20050520/kitchsrc/lib/libc/sys/TAssert.C:50
#2 0xc0000000023b0c48 in errorWithMsg(char const*, char const*, unsigned long, char const*, ...) (
failedexpr=0xc000000002474b98 "(__builtin_expect(((rc)>=0),1))",
fname=0xc000000002474bb8 "/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C", lineno=136, fmt=0xc000000002474d10 "rc 0x%lx\n")
at /homes/kix/tamdavid/k42-20050520/kitchsrc/lib/libc/sys/TAssert.C:108
#3 0xc00000000221eafc in FRPA::startPutPage(unsigned long, unsigned long, IORestartRequests*) (this=0x8002000020623600, physAddr=877690880,
objOffset=5795840, rr=0x80020000208c6d80)
at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C:136
#4 0xc0000000022c3b58 in FSFRSwap::startPutPage(unsigned long, FRComputation**, unsigned long, unsigned long&, unsigned long volatile*, IORestartRequests*) (
this=0xc00000000549b300, physAddr=877690880, ref=0x8000000010008f20,
offset=11460608, blockID=@0xc000000002ef4a30, context=0x8002000000328070,
rr=0x80020000208c6d80)
at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/bilge/FSFRSwap.C:227
#5 0xc00000000222414c in FRComputation::putPageInternal(unsigned long, unsigned long, unsigned long, IORestartRequests*) (this=0x8002000000328000,
physAddr=877690880, offset=11460608, async=1, rr=0x80020000208c6d80)
at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRComputation.C:200
#6 0xc00000000222431c in FRComputation::startPutPage(unsigned long, unsigned long, IORestartRequests*) (this=0x8002000000328000, physAddr=877690880,
offset=11460608, rr=0x80020000208c6d80)
at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRComputation.C:242
#7 0xc0000000021b0788 in FCMDefault::resumeIO() (this=0x8002000020811a00)
at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FCMDefault.C:808
#8 0xc0000000021b2424 in IORestartRequests::notify() (this=0x80020000208c6d80)
at IORestartRequests.H:103
#9 0xc000000002230ee8 in IORestartRequests::NotifyAll(IORestartRequests*) (
qcopy=0x0) at IORestartRequests.H:119
#10 0xc000000002240874 in KernelPagingTransport::ioComplete() (
this=0x800200000030a400)
at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/KernelPagingTransport.C:171
#11 0xc00000000221cd48 in FRVA::_ioComplete(unsigned long, unsigned long, long)
(this=0x8002000020623600, vaddr=1100586164224, fileOffset=5541888, rc=0)
at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRVA.C:114
#12 0xc000000002250148 in XFRVA::__ioCompleteEmm(unsigned long) (
this=0x8002000000406f00, callerID=4294967301) at XFRVA.C:130
#13 0xc00000000239a018 in DispatcherDefault_InvokeXObjMethod ()
at CObjRootMediator.H:102
#14 0xc000000002399eb0 in DispatcherDefault_PPCServerOnThread ()
at CObjRootMediator.H:102
(gdb)
=============================
k42console output
-----------------
Giving back 0x10 pages (0x80001 > 0x80000)
Giving back 0x10 pages (0x80002 > 0x80000)
Giving back 0x10 pages (0x80003 > 0x80000)
Giving back 0x10 pages (0x80004 > 0x80000)
Giving back 0x10 pages (0x80004 > 0x80000)
...
..
.
(~145 times)
Giving back 0x10 pages (0x80001 > 0x80000)
Giving back 0x10 pages (0x80001 > 0x80000)
Giving back 0x10 pages (0x80001 > 0x80000)
Giving back 0x10 pages (0x80001 > 0x80000)
ERROR: file "/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C", line 136
rc 0x800000000b2c0110
GDB got trap: Program Interrupt
vector=0x700, sr=0xa00000000002b032, pc=0xc0000000022afb34 lr=0xc0000000023b0ae4
Kernel Connecting to GDB via thinwire channel
(use kvictim to find gdb target machine and port)
--
David Tam <tamda at eecg.toronto.edu>
Graduate Student, ECE Dept, University of Toronto
http://www.eecg.toronto.edu/~tamda
More information about the K42-discussion
mailing list