[K42-discussion] Debugging help

Maria Butrico butrico at us.ibm.com
Thu Nov 10 08:52:38 EST 2005


Here is a debugging hint, just on debugging mechanics.  Look up in the
source the unique (we hope) error number, in this case b2c (in the source
the error numbers are in decimal) and trace your way backwards from there.
This usually yield a lot of information.  If the number is not unique,
please complain.

Maria Butrico    <internet or sametime: butrico at us.ibm.com;     Notes:
Maria Butrico/Watson/IBM>



|---------+--------------------------------->
|         |           David Tam             |
|         |           <tamda at eecg.toronto.ed|
|         |           u>                    |
|         |           Sent by:              |
|         |           k42-discussion-bounces|
|         |           @ozlabs.org           |
|         |                                 |
|         |                                 |
|         |           11/09/2005 01:40 PM   |
|---------+--------------------------------->
  >-----------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                             |
  |       To:       k42-discussion at ozlabs.org                                                                                   |
  |       cc:                                                                                                                   |
  |       Subject:  [K42-discussion] Debugging help                                                                             |
  >-----------------------------------------------------------------------------------------------------------------------------|




I need some debugging help and I'm wondering if anyone has any clue as to
what my bug might be.

I've been attempting to run SPECjbb2000 + J9 JVM on K42 with my user-level
thread migration patch enabled.  My kitchsrc was last updated on Nov 2nd.

Everything runs fine on a 4-CPU system (k10) but I encounter a gdb
breakpoint in the kernel when running on a 8-CPU system (k0 with only 8
CPUs enabled).

The k42 console reports the following message ~145 times and then
hits a gdb breakpoint in the kernel.

             Giving back 0x10 pages (Y > 0x80000)

, where Y is between 0x80002 and 0x0005 inclusively.

gdb tells me that I triggered the assert in FRPA::startPutPage()
because rc=0x800000000b2c0110.

FRPA::startPutPage() {
...
..
.
    // FIXME, pass in blocking info here...
    rc = convertAddressWriteTo(physAddr, addr, rr);
    tassertMsg(_SUCCESS(rc), "rc 0x%lx\n", rc);
...
..
.
}

Upon further investigation of the source code of convertAddressWriteTo(),
I find that it always returns 0.
Therefore, it should be impossible for the tassertMsg() to be triggered.

virtual SysStatus convertAddressWriteTo(uval physAddr, uval &vaddr,
                                        IORestartRequests *rr=0) {
    vaddr = physAddr;
    return 0;
}



Perhaps there is memory corruption caused by my changes to the
user-level scheduler code (kitchsrc/lib/libc/scheduler/*) ?

Any guesses, hints, suggestions are gladly welcomed.
Thanks.



=========

Here is some more information about that frame.

(gdb) info frame
Stack level 3, frame at 0xc000000002ef4860:
 pc = 0xc00000000221eafc
    in FRPA::startPutPage(unsigned long, unsigned long, IORestartRequests*)
    (/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C:136);
    saved pc 0xc0000000022c3b58
 called by frame at 0xc000000002ef4910, caller of frame at
0xc000000002ef47d0
 source language c++.
 Arglist at 0xc000000002ef4860, args: this=0x8002000020623600,
    physAddr=877690880, objOffset=5795840, rr=0x80020000208c6d80
 Locals at 0xc000000002ef4860, Previous frame's sp in r1
 Saved registers:
  r30 at 0xc000000002ef4900, r31 at 0xc000000002ef4908,
  lr at 0xc000000002ef4920
(gdb)


Local variables:
(gdb) info local
size = 4096
addr = 11460608
rc = -9223372036667342576
(gdb)

Doing a gdb "backtrace" reports the following.
(gdb) bt
#0  breakpoint () at libksup.C:49
#1  0xc0000000023b0ae4 in raiseError() ()
    at /homes/kix/tamdavid/k42-20050520/kitchsrc/lib/libc/sys/TAssert.C:50
#2  0xc0000000023b0c48 in errorWithMsg(char const*, char const*, unsigned
long, char const*, ...) (
    failedexpr=0xc000000002474b98 "(__builtin_expect(((rc)>=0),1))",
    fname=0xc000000002474bb8
"/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C",
lineno=136, fmt=0xc000000002474d10 "rc 0x%lx\n")
    at /homes/kix/tamdavid/k42-20050520/kitchsrc/lib/libc/sys/TAssert.C:108
#3  0xc00000000221eafc in FRPA::startPutPage(unsigned long, unsigned long,
IORestartRequests*) (this=0x8002000020623600, physAddr=877690880,
    objOffset=5795840, rr=0x80020000208c6d80)
    at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C:136
#4  0xc0000000022c3b58 in FSFRSwap::startPutPage(unsigned long,
FRComputation**, unsigned long, unsigned long&, unsigned long volatile*,
IORestartRequests*) (
    this=0xc00000000549b300, physAddr=877690880, ref=0x8000000010008f20,
    offset=11460608, blockID=@0xc000000002ef4a30,
context=0x8002000000328070,
    rr=0x80020000208c6d80)
    at
/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/bilge/FSFRSwap.C:227
#5  0xc00000000222414c in FRComputation::putPageInternal(unsigned long,
unsigned long, unsigned long, IORestartRequests*) (this=0x8002000000328000,

    physAddr=877690880, offset=11460608, async=1, rr=0x80020000208c6d80)
    at
/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRComputation.C:200
#6  0xc00000000222431c in FRComputation::startPutPage(unsigned long,
unsigned long, IORestartRequests*) (this=0x8002000000328000,
physAddr=877690880,
    offset=11460608, rr=0x80020000208c6d80)
    at
/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRComputation.C:242
#7  0xc0000000021b0788 in FCMDefault::resumeIO() (this=0x8002000020811a00)
    at
/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FCMDefault.C:808
#8  0xc0000000021b2424 in IORestartRequests::notify()
(this=0x80020000208c6d80)
    at IORestartRequests.H:103
#9  0xc000000002230ee8 in IORestartRequests::NotifyAll(IORestartRequests*)
(
    qcopy=0x0) at IORestartRequests.H:119
#10 0xc000000002240874 in KernelPagingTransport::ioComplete() (
    this=0x800200000030a400)
    at
/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/KernelPagingTransport.C:171

#11 0xc00000000221cd48 in FRVA::_ioComplete(unsigned long, unsigned long,
long)
    (this=0x8002000020623600, vaddr=1100586164224, fileOffset=5541888,
rc=0)
    at /homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRVA.C:114
#12 0xc000000002250148 in XFRVA::__ioCompleteEmm(unsigned long) (
    this=0x8002000000406f00, callerID=4294967301) at XFRVA.C:130
#13 0xc00000000239a018 in DispatcherDefault_InvokeXObjMethod ()
    at CObjRootMediator.H:102
#14 0xc000000002399eb0 in DispatcherDefault_PPCServerOnThread ()
    at CObjRootMediator.H:102
(gdb)

=============================

k42console output
-----------------
             Giving back 0x10 pages (0x80001 > 0x80000)
             Giving back 0x10 pages (0x80002 > 0x80000)
             Giving back 0x10 pages (0x80003 > 0x80000)
             Giving back 0x10 pages (0x80004 > 0x80000)
             Giving back 0x10 pages (0x80004 > 0x80000)
...
..
.
(~145 times)
             Giving back 0x10 pages (0x80001 > 0x80000)
             Giving back 0x10 pages (0x80001 > 0x80000)
             Giving back 0x10 pages (0x80001 > 0x80000)
             Giving back 0x10 pages (0x80001 > 0x80000)
ERROR: file
"/homes/kix/tamdavid/k42-20050520/kitchsrc/os/kernel/mem/FRPA.C", line 136
rc 0x800000000b2c0110
GDB got trap: Program Interrupt
vector=0x700, sr=0xa00000000002b032, pc=0xc0000000022afb34
lr=0xc0000000023b0ae4
Kernel Connecting to GDB via thinwire channel
(use kvictim to find gdb target machine and port)


--
David Tam <tamda at eecg.toronto.edu>
Graduate Student, ECE Dept, University of Toronto
http://www.eecg.toronto.edu/~tamda

_______________________________________________
K42-discussion mailing list
K42-discussion at ozlabs.org
https://ozlabs.org/mailman/listinfo/k42-discussion





More information about the K42-discussion mailing list