<div dir="ltr">Dear All,<br><br>It seems the Xilinx lltemac driver (or IP core) is corrupting the memory.<br><br>We are using Xilinx Virtex 5 FX30T Eval board + AES-EXP-INTF-MOD-G(Avnet EXP Interface Module providing another RJ45). We are using the latest kernel tree from <a href="http://git.xilinx.com">git.xilinx.com</a> with LL_TEMAC. The eval board is located between two host PCs as following.<br>
<br>Host_PC_1 --> FX30T_Eval_Board -> Host_PC_2<br><br>The bug is relatively easy to reproduce by transmit a few gigabytes of data from one host pc to another host pc via the FX30T Eval board.<br><br>It seems the Xilinx lltemac driver (or IP core) is corrupting the memory.<br>
<br>This can be detected easily by using the SLUB memory allocator with<br>full debugging. With SLAB, or SLUB without debugging, crashes are<br>frequent.<br><br>It seems the problems happens more often when our interfaces are<br>
configured in bridging mode, but it happens too when configured in<br>routing (although it requires a few gigabytes of data to be<br>transferred to trigger the problem).<br><br>Common warning messages are like these:<br>[ 702.477923] =============================================================================<br>
[ 702.481375] BUG kmalloc-2048: Poison overwritten<br>[ 702.483664] -----------------------------------------------------------------------------<br>[ 702.483676]<br>[ 702.488463] INFO: 0xc39e0060-0xc39e0061. First byte 0x0 instead of 0x6b<br>
[ 702.491742] INFO: Allocated in dev_alloc_skb+0x20/0x50 age=20 cpu=0 pid=0<br>[ 702.495109] INFO: Freed in skb_release_data+0xb0/0xc4 age=1 cpu=0 pid=0<br>[ 702.498399] INFO: Slab 0xc02fac00 objects=15 used=13 fp=0xc39e0000 flags=0x20c2<br>
[ 702.502025] INFO: Object 0xc39e0000 @offset=0 fp=0xc39e0840<br>[ 702.502034]<br>[ 702.505523] Object 0xc39e0000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>[ 702.509971] Object 0xc39e0010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>
[ 702.514420] Object 0xc39e0020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>[ 702.518870] Object 0xc39e0030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>[ 702.523320] Object 0xc39e0040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>
[ 702.527769] Object 0xc39e0050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>[ 702.532219] Object 0xc39e0060: 00 2f 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ./kkkkkkkkkkkkkk<br>[ 702.536669] Object 0xc39e0070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>
[ 702.541119] Redzone 0xc39e0800: bb bb bb bb ����<br>[ 702.545568] Padding 0xc39e0828: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ<br>[ 702.550017] Padding 0xc39e0838: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ<br>
[ 702.554466] Call Trace:<br>[ 702.555678] [c023dcd0] [c00056c8] show_stack+0x4c/0x174 (unreliable)<br>[ 702.558831] [c023dd00] [c005cfa8] print_trailer+0x10c/0x120<br>[ 702.561596] [c023dd20] [c005d084] check_bytes_and_report+0xc8/0xfc<br>
[ 702.564663] [c023dd50] [c005d1d8] check_object+0xd4/0x20c<br>[ 702.567341] [c023dd70] [c005ea1c] __slab_alloc+0x3f8/0x4d4<br>[ 702.570063] [c023dda0] [c005f644] __kmalloc_track_caller+0x84/0xdc<br>[ 702.573130] [c023ddc0] [c010edcc] __alloc_skb+0x64/0x114<br>
[ 702.575765] [c023dde0] [c0110600] dev_alloc_skb+0x20/0x50<br>[ 702.578443] [c023ddf0] [c00f0e6c] _xenet_DmaSetupRecvBuffers+0x44/0x238<br>[ 702.581727] [c023de30] [c00f1254] DmaRecvHandlerBH+0x1f4/0x238<br>[ 702.584621] [c023de70] [c001e7dc] tasklet_action+0x7c/0xd4<br>
[ 702.587343] [c023de80] [c001e8a4] __do_softirq+0x5c/0xc4<br>[ 702.589978] [c023dea0] [c0002db8] do_softirq+0x3c/0x54<br>[ 702.592527] [c023deb0] [c001e6ec] irq_exit+0x38/0x48<br>[ 702.594989] [c023dec0] [c00035fc] do_IRQ+0xb0/0xcc<br>
[ 702.597365] [c023ded0] [c000c010] ret_from_except+0x0/0x18<br>[ 702.600087] [c023df90] [c000616c] cpu_idle+0xc4/0xc8<br>[ 702.602549] [c023dfb0] [c018a058] etext+0x58/0x68<br>[ 702.604882] [c023dfc0] [c020b704] start_kernel+0x254/0x268<br>
[ 702.607604] [c023dff0] [c0000240] skpinv+0x1e8/0x224<br>[ 702.610071] FIX kmalloc-2048: Restoring 0xc39e0060-0xc39e0061=0x6b<br>[ 702.610081]<br>[ 702.613869] FIX kmalloc-2048: Marking all objects used<br>[ 865.306194] =============================================================================<br>
[ 865.309648] BUG kmalloc-2048: Poison overwritten<br>[ 865.311938] -----------------------------------------------------------------------------<br>[ 865.311950]<br>[ 865.316737] INFO: 0xc3a70020-0xc3a7037f. First byte 0x0 instead of 0x6b<br>
[ 865.320015] INFO: Allocated in dev_alloc_skb+0x20/0x50 age=12 cpu=0 pid=0<br>[ 865.323383] INFO: Freed in skb_release_data+0xb0/0xc4 age=2 cpu=0 pid=0<br>[ 865.326672] INFO: Slab 0xc02fbe00 objects=15 used=12 fp=0xc3a70000 flags=0x20c2<br>
[ 865.330299] INFO: Object 0xc3a70000 @offset=0 fp=0xc3a71080<br>[ 865.330308]<br>[ 865.333796] Object 0xc3a70000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>[ 865.338244] Object 0xc3a70010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>
[ 865.342694] Object 0xc3a70020: 00 a1 b0 01 0e 9a 00 01 e1 08 93 7a 08 00 45 10 .��.....�..z..E.<br>[ 865.347144] Object 0xc3a70030: 04 64 46 61 40 00 40 06 da 10 0a 01 01 01 0a 01 .dFa@.@.�.......<br>[ 865.351593] Object 0xc3a70040: 01 10 00 16 d3 ce c6 24 33 fe 04 ad 65 06 80 18 ....���$3�.�e...<br>
[ 865.356043] Object 0xc3a70050: 03 59 3a cf 00 00 01 01 08 0a 03 ca 9f 31 00 10 .Y:�.......�.1..<br>[ 865.360493] Object 0xc3a70060: 8f 36 6e 44 0f e7 88 42 48 9c b8 98 78 e9 06 29 .6nD.�.BH.�.x�.)<br>[ 865.364942] Object 0xc3a70070: d4 78 ad 03 34 b4 06 6c cc 97 39 af 73 ac 99 52 �x�.4�.l�.9�s�.R<br>
[ 865.369392] Redzone 0xc3a70800: bb bb bb bb ����<br>[ 865.373842] Padding 0xc3a70828: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ<br>[ 865.378291] Padding 0xc3a70838: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ<br>
[ 865.382740] Call Trace:<br>[ 865.383952] [c023dcd0] [c00056c8] show_stack+0x4c/0x174 (unreliable)<br>[ 865.387105] [c023dd00] [c005cfa8] print_trailer+0x10c/0x120<br>[ 865.389869] [c023dd20] [c005d084] check_bytes_and_report+0xc8/0xfc<br>
[ 865.392936] [c023dd50] [c005d1d8] check_object+0xd4/0x20c<br>[ 865.395615] [c023dd70] [c005ea1c] __slab_alloc+0x3f8/0x4d4<br>[ 865.398336] [c023dda0] [c005f644] __kmalloc_track_caller+0x84/0xdc<br>[ 865.401403] [c023ddc0] [c010edcc] __alloc_skb+0x64/0x114<br>
[ 865.404039] [c023dde0] [c0110600] dev_alloc_skb+0x20/0x50<br>[ 865.406717] [c023ddf0] [c00f0e6c] _xenet_DmaSetupRecvBuffers+0x44/0x238<br>[ 865.410000] [c023de30] [c00f1254] DmaRecvHandlerBH+0x1f4/0x238<br>[ 865.412895] [c023de70] [c001e7dc] tasklet_action+0x7c/0xd4<br>
[ 865.415616] [c023de80] [c001e8a4] __do_softirq+0x5c/0xc4<br>[ 865.418252] [c023dea0] [c0002db8] do_softirq+0x3c/0x54<br>[ 865.420800] [c023deb0] [c001e6ec] irq_exit+0x38/0x48<br>[ 865.423263] [c023dec0] [c00035fc] do_IRQ+0xb0/0xcc<br>
[ 865.425639] [c023ded0] [c000c010] ret_from_except+0x0/0x18<br>[ 865.428360] [c023df90] [c000616c] cpu_idle+0xc4/0xc8<br>[ 865.430823] [c023dfb0] [c018a058] etext+0x58/0x68<br>[ 865.433156] [c023dfc0] [c020b704] start_kernel+0x254/0x268<br>
[ 865.435877] [c023dff0] [c0000240] skpinv+0x1e8/0x224<br>[ 865.438345] FIX kmalloc-2048: Restoring 0xc3a70020-0xc3a7037f=0x6b<br>[ 865.438354]<br>[ 865.442145] FIX kmalloc-2048: Marking all objects used<br><br>As you can seem, the SLUB poisoning has been overwritten.<br>
The system is still alive and the networking is still working fine<br>after these warnings though.<br><br>Sometimes, the corruption happens at the wrong moment (or the driver<br>uses a packet it should be using anymore), and causes a kernel crash,<br>
like here:<br>[ 516.247913] Unable to handle kernel paging request for data at address 0x6b6b6b73<br>[ 516.251015] Faulting instruction address: 0xc014edf8<br>[ 516.253482] Oops: Kernel access of bad area, sig: 11 [#1]<br>
[ 516.256152] DEBUG_PAGEALLOC Xilinx Virtex<br>[ 516.258139] Modules linked in:<br>[ 516.259655] NIP: c014edf8 LR: c014f074 CTR: 00000000<br>[ 516.262121] REGS: c023daa0 TRAP: 0300 Not tainted (2.6.26-jitcomm)<br>[ 516.265310] MSR: 00029000 <EE,ME> CR: 44000084 XER: 00000000<br>
[ 516.268206] DEAR: 6b6b6b73, ESR: 00000000<br>[ 516.270196] TASK = c0225430[0] 'swapper' THREAD: c023c000<br>[ 516.272784] GPR00: 00000000 c023db50 c0225430 c39e02d4 bbbbbbbb 00000004 c39e02d4 c39e02fc<br>[ 516.276931] GPR08: 6dfc74ed d967e058 6b6b6b6b 6b6b6b6b 0a010101 00004c80 00000004 00000001<br>
[ 516.281079] GPR16: fdffffff f9fb5ff3 d967dab0 00000502 00000000 d967e058 00000001 00000000<br>[ 516.285226] GPR24: 00000000 c39e01a4 00000004 0000006b 00000001 00000000 c39e0180 c3b7da40<br>[ 516.289459] NIP [c014edf8] tcp_ack+0x3a0/0x1940<br>
[ 516.291706] LR [c014f074] tcp_ack+0x61c/0x1940<br>[ 516.293909] Call Trace:<br>[ 516.295120] [c023db50] [c014f074] tcp_ack+0x61c/0x1940 (unreliable)<br>[ 516.298231] [c023dbc0] [c0150ab0] tcp_rcv_state_process+0x718/0xb5c<br>
[ 516.301341] [c023dbe0] [c0158a34] tcp_child_process+0x3c/0xd4<br>[ 516.304192] [c023dc00] [c0157fa8] tcp_v4_do_rcv+0x110/0x194<br>[ 516.306957] [c023dc30] [c01585e4] tcp_v4_rcv+0x5b8/0x614<br>[ 516.309592] [c023dc60] [c013d480] ip_local_deliver_finish+0x10c/0x198<br>
[ 516.312788] [c023dc80] [c013d0d4] ip_rcv_finish+0x2d0/0x304<br>[ 516.315553] [c023dcb0] [c0119950] netif_receive_skb+0x2f0/0x31c<br>[ 516.318491] [c023dce0] [c017e8e8] br_handle_frame_finish+0x14c/0x19c<br>[ 516.321645] [c023dd00] [c01835a8] br_nf_pre_routing_finish+0x2ec/0x304<br>
[ 516.324885] [c023dd60] [c0183fe4] br_nf_pre_routing+0x684/0x6b0<br>[ 516.327822] [c023dd80] [c012c070] nf_iterate+0x54/0xc4<br>[ 516.330371] [c023ddb0] [c012c14c] nf_hook_slow+0x6c/0xf4<br>[ 516.333006] [c023ddf0] [c017eb4c] br_handle_frame+0x214/0x24c<br>
[ 516.335858] [c023de10] [c0119880] netif_receive_skb+0x220/0x31c<br>[ 516.338795] [c023de40] [c0119a04] process_backlog+0x88/0x12c<br>[ 516.341603] [c023de60] [c0115ec8] net_rx_action+0x98/0x190<br>[ 516.344325] [c023de80] [c001e8a4] __do_softirq+0x5c/0xc4<br>
[ 516.346960] [c023dea0] [c0002db8] do_softirq+0x3c/0x54<br>[ 516.349509] [c023deb0] [c001e6ec] irq_exit+0x38/0x48<br>[ 516.351971] [c023dec0] [c00035fc] do_IRQ+0xb0/0xcc<br>[ 516.354347] [c023ded0] [c000c010] ret_from_except+0x0/0x18<br>
[ 516.357069] [c023df90] [c000616c] cpu_idle+0xc4/0xc8<br>[ 516.359531] [c023dfb0] [c018a058] etext+0x58/0x68<br>[ 516.361864] [c023dfc0] [c020b704] start_kernel+0x254/0x268<br>[ 516.364586] [c023dff0] [c0000240] skpinv+0x1e8/0x224<br>
[ 516.367046] Instruction dump:<br>[ 516.368516] 0f000000 393e0024 81690014 80090010 7c0b0050 54000ffe 68000001 0f000000<br>[ 516.372360] 825f02a0 39e00000 480000e4 2f8f0000 <a38a0008> 7d725b78 419e00d4 801f0244<br>
[ 516.376385] Kernel panic - not syncing: Fatal exception in interrupt<br><br>Or here:<br>[ 450.520795]<br>=============================================================================<br>[ 450.524247] BUG kmalloc-2048: Poison overwritten<br>
[ 450.526537]<br>-----------------------------------------------------------------------------<br>[ 450.526548]<br>[ 450.531336] INFO: 0xc3a68020-0xc3a68061. First byte 0x2 instead of 0x6b<br>[ 450.534614] INFO: Allocated in dev_alloc_skb+0x20/0x50 age=24 cpu=0 pid=2115<br>
[ 450.538112] INFO: Freed in skb_release_data+0xb0/0xc4 age=2 cpu=0 pid=2115<br>[ 450.541530] INFO: Slab 0xc02fbd00 objects=15 used=14 fp=0xc3a68000 flags=0x20c2<br>[ 450.545156] INFO: Object 0xc3a68000 @offset=0 fp=0x00000000 [ 450.545165]<br>
[ 450.548654] Object 0xc3a68000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>[ 450.553103] Object 0xc3a68010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>[ 450.557552] Object 0xc3a68020: 02 00 00 00 00 01 00 01 e1 08 93 7a 08 00 45 00 ........�..z..E.<br>
[ 450.562002] Object 0xc3a68030: 00 34 1d de 40 00 40 06 06 d4 0a 01 01 01 0a 01 .4.�@.@..�......<br>[ 450.566451] Object 0xc3a68040: 01 10 0d 05 bd 4f ee 9b cd 34 3e 23 ff 36 80 10 ....�O�.�4>#�6..<br>[ 450.570901] Object 0xc3a68050: 46 2e 32 9a 00 00 01 01 08 0a 08 af e0 ee 00 00 F.2........���..<br>
[ 450.575351] Object 0xc3a68060: 3a c5 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b :�kkkkkkkkkkkkkk<br>[ 450.579800] Object 0xc3a68070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk<br>[ 450.584250] Redzone 0xc3a68800: bb bb bb bb ����<br>
[ 450.588700] Padding 0xc3a68828: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ<br>[ 450.593149] Padding 0xc3a68838: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ<br>[ 450.597598] Call Trace:<br>
[ 450.598809] [c38cbc60] [c00056c8] show_stack+0x4c/0x174 (unreliable)<br>[ 450.601963] [c38cbc90] [c005cfb8] print_trailer+0x10c/0x120<br>[ 450.604727] [c38cbcb0] [c005d094] check_bytes_and_report+0xc8/0xfc<br>[ 450.607794] [c38cbce0] [c005d1e8] check_object+0xd4/0x20c<br>
[ 450.610473] [c38cbd00] [c005ea2c] __slab_alloc+0x3f8/0x4d4<br>[ 450.613194] [c38cbd30] [c005f654] __kmalloc_track_caller+0x84/0xdc<br>[ 450.616261] [c38cbd50] [c010edfc] __alloc_skb+0x64/0x114<br>[ 450.618897] [c38cbd70] [c0146758] sk_stream_alloc_skb+0x38/0x118<br>
[ 450.621878] [c38cbd90] [c0149244] tcp_sendmsg+0x178/0xb74<br>[ 450.624556] [c38cbdf0] [c0109a10] sock_aio_write+0xe0/0xfc<br>[ 450.627278] [c38cbe50] [c006256c] do_sync_write+0xb8/0x10c<br>[ 450.629999] [c38cbf10] [c00633b4] sys_write+0x4c/0x8c<br>
[ 450.632505] [c38cbf40] [c000b9b0] ret_from_syscall+0x0/0x3c<br>[ 450.635275] FIX kmalloc-2048: Restoring 0xc3a68020-0xc3a68061=0x6b<br>[ 450.635285]<br>[ 450.639074] FIX kmalloc-2048: Marking all objects used<br>[ 510.208831] BUG: Not enough tailroom in the skb to add this data (tailroom=0 len=1514 3308860 6b6b6b6b 1802201963 framesize=1518)<br>
<br>The last error message comes from the lltemac driver. we added a check<br>before skb_put is run, and as you can see, the skb structure has been<br>corrupted.<br><br>Soory for the long email. Any help will be much appreciated. Thanks for your patience.<br>
<br>Steve<br></div>