KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

Erhard Furtner erhard_f at mailbox.org
Mon Aug 28 09:17:58 AEST 2023


On Thu, 24 Aug 2023 21:36:26 +1000
Michael Ellerman <mpe at ellerman.id.au> wrote:

> > printk: bootconsole [udbg0] enabled
> > Total memory = 2048MB; using 4096kB for hash table
> > mapin_ram:125
> > mmu_mapin_ram:169 0 30000000 1400000 2000000
> > __mmu_mapin_ram:146 0 1400000
> > __mmu_mapin_ram:155 1400000
> > __mmu_mapin_ram:146 1400000 30000000
> > __mmu_mapin_ram:155 20000000
> > __mapin_ram_chunk:107 20000000 30000000
> > __mapin_ram_chunk:117
> > mapin_ram:134
> > kasan_mmu_init:129
> > kasan_mmu_init:132 0
> > kasan_mmu_init:137
> > ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead
> > Linux version 6.5.0-rc7-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #4 SMP Wed Aug 23 12:59:11 CEST 2023
> >
> > which shows one line (Linux version...) more than before. Most of the time I get this more interesting output however:
> >
> > kasan_mmu_init:129
> > kasan_mmu_init:132 0
> > kasan_mmu_init:137
> > Linux version 6.5.0-rc7-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #4 SMP Wed Aug 23 12:59:11 CEST 2023
> > KASAN init done
> > list_add corruption. prev->next should be next (c17100c0), but was 2c030000. (prev=c036ac7c).
> > ------------[ cut here ]------------
> > kernel BUG at lib/list_debug.c:30!
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/machdep.h:227 die+0xd8/0x39c  
> 
> This is a WARN hit while handling the original bug.
> 
> Can you apply this patch to avoid that happening, so we can see the
> original but better.
> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index eeff136b83d9..341a0635e131 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -198,8 +198,6 @@ static unsigned long oops_begin(struct pt_regs *regs)
>  	die_owner = cpu;
>  	console_verbose();
>  	bust_spinlocks(1);
> -	if (machine_is(powermac))
> -		pmac_backlight_unblank();
>  	return flags;
>  }
>  NOKPROBE_SYMBOL(oops_begin);
> 
> 
> cheers

Ok, so I tested now:
   Replace btext_unmap() with btext_map() at the end of MMU_init() + Michaels patch.

With the patch I get interesting output less often, but when I do it's:

printk: bootconsole [udbg0] enabled
Total memory = 2048MB; using 4096kB for hash table
mapin_ram:125
mmu_mapin_ram:169 0 30000000 1400000 2000000
__mmu_mapin_ram:146 0 1400000
__mmu_mapin_ram:155 1400000
__mmu_mapin_ram:146 1400000 30000000
__mmu_mapin_ram:155 20000000
__mapin_ram_chunk:107 20000000 30000000
__mapin_ram_chunk:117
mapin_ram:134
kasan_mmu_init:129
kasan_mmu_init:132 0
kasan_mmu_init:137
Linux version 6.5.0-rc7-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #4 SMP Wed Aug 23 12:59:11 CEST 2023
KASAN init done
BUG: spinlock bad magic on CPU#0, swapper/0
 lock: 0xc16cbc60, .magic: c036ab84, .owner: <none>/-1, .owner_cpu: -1
CPU: 0 PID: 0 Comm: swapper Tainted: G                T xxxxxxxxxxx
Call Trace:
[c1717c20] [c0f4e288] dump_stack_lvl+0x60/0xa4 (unreliable)
[c1717c40] [c01065e8] do_raw_spin_lock+0x15c/0x1a8
[c1717c70] [c0fa3890] _raw_spin_lock_irqsave+0x20/0x40
[c1717c90] [c0c140ec] of_find_property+0x3c/0x140
[c1717cc0] [c0c14204] of_get_property+0x14/0x4c
[c1717ce0] [c0c22c6c] unlatten_dt_nodes+0x76c/0x894
[c1717f10] [c0c22e88] __unflatten_device_tree+0xf4/0x244
[c1717f50] [c1458050] unflatten_device_tree+0x48/0x84
[c1717f70] [c140b100] setup_arch+0x78/0x44c
[c1717fc0] [c14045b8] start_kernel+0x78/0x2d8
[c1717ff0] [000035d0] 0x35d0


and then the freeze. Or less often I get:

[...]
Modules linked in: _various ASCII chars_ |(EK) _various ASCII chars_ §=(EKTN)
BUG: Unable to handle kernel data access on read at 0x813f0200
Faulting instruction address: 0xc014e444
Thread overran stack, or stack corrupted
Oops: Kernel access of bad area, sig: 11 [#3544]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2
Modules linked in: _various ASCII chars_ §=(EKTN)
BUG: Unable to handle kernel data access on read at 0x813f0200
Faulting instruction address: 0xc014e444
Thread overran stack, or stack corrupted
Oops: Kernel access of bad area, sig: 11 [#3545]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2


Number after "sig: 11" gets counted up rapidly to #3545 so I can't follow the output on the OF console. Remaining output on screen before the freeze are [#3535] to [#3545] but apart from the numbers the adresses in this output do not change. _various ASCII chars_ in the "Modules linked in:" stay the same but are special characters so hard to transcribe.

Hope that helps.

Regards,
Erhard


More information about the Linuxppc-dev mailing list