KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

Erhard Furtner erhard_f at mailbox.org
Thu Sep 14 22:33:48 AEST 2023


On Thu, 14 Sep 2023 04:54:17 +0000
Christophe Leroy <christophe.leroy at csgroup.eu> wrote:

> Le 12/09/2023 à 19:39, Christophe Leroy a écrit :
> > 
> > 
> > Le 12/09/2023 à 17:59, Erhard Furtner a écrit :  
> >>
> >> printk: bootconsole [udbg0] enabled
> >> Total memory = 2048MB; using 4096kB for hash table
> >> mapin_ram:125
> >> mmu_mapin_ram:169 0 30000000 1400000 2000000
> >> __mmu_mapin_ram:146 0 1400000
> >> __mmu_mapin_ram:155 1400000
> >> __mmu_mapin_ram:146 1400000 30000000
> >> __mmu_mapin_ram:155 20000000
> >> __mapin_ram_chunk:107 20000000 30000000
> >> __mapin_ram_chunk:117
> >> mapin_ram:134
> >> kasan_mmu_init:129
> >> kasan_mmu_init:132 0
> >> kasan_mmu_init:137
> >> ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead
> >> Linux version 6.6.0-rc1-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #5 SMP Tue Sep 12 16:50:47 CEST 2023
> >> kasan_init_region: c0000000 30000000 f8000000 fe000000
> >> kasan_init_region: loop f8000000 fe000000
> >>
> >>
> >> So I get no "kasan_init_region: setbat" line and don't reach "KASAN init done".  
> > 
> > Ah ok, maybe your CPU only has 4 BATs and they are all used, following
> > change would tell us.
> > 
> > diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> > index 850783cfa9c7..bd26767edce7 100644
> > --- a/arch/powerpc/mm/book3s32/mmu.c
> > +++ b/arch/powerpc/mm/book3s32/mmu.c
> > @@ -86,6 +86,7 @@ int __init find_free_bat(void)
> >    		if (!(bat[1].batu & 3))
> >    			return b;
> >    	}
> > +	pr_err("NO FREE BAT (%d)\n", n);
> >    	return -1;
> >    }
> > 
> > 
> > Or you have 8 BATs in which case it's an alignment problem, you need to
> > increase CONFIG_DATA_SHIFT to 23, for that you need CONFIG_ADVANCED and
> > CONFIG_DATA_SHIFT_BOOL
> > 
> > But regardless of that there is a problem we need to find out, because
> > it should work without BATs.
> > 
> > As the BATs allocation fails, it falls back to :
> > 
> > 	phys = memblock_phys_alloc_range(k_end - k_start, PAGE_SIZE, 0,
> > 						 MEMBLOCK_ALLOC_ANYWHERE);
> > 		if (!phys)
> > 			return -ENOMEM;
> > 	}
> > 
> > 	ret = kasan_init_shadow_page_tables(k_start, k_end);
> > 	if (ret)
> > 		return ret;
> > 
> > 	for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE) {
> > 		pmd_t *pmd = pmd_off_k(k_cur);
> > 		pte_t pte = pfn_pte(PHYS_PFN(phys + k_cur - k_start), PAGE_KERNEL);
> > 
> > 		__set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0);
> > 	}
> > 	flush_tlb_kernel_range(k_start, k_end);
> > 	memset(kasan_mem_to_shadow(start), 0, k_end - k_start);
> > 
> > 
> > While the __weak function that you confirmed working is:
> > 
> > 	ret = kasan_init_shadow_page_tables(k_start, k_end);
> > 	if (ret)
> > 		return ret;
> > 
> > 	block = memblock_alloc(k_end - k_start, PAGE_SIZE);
> > 	if (!block)
> > 		return -ENOMEM;
> > 
> > 	for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
> > 		pmd_t *pmd = pmd_off_k(k_cur);
> > 		void *va = block + k_cur - k_start;
> > 		pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
> > 
> > 		__set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0);
> > 	}
> > 	flush_tlb_kernel_range(k_start, k_end);
> > 
> > 
> > I'm having hard time to understand what's could be wrong at the first place.
> > 
> > Could you try following change:
> > 
> > diff --git a/arch/powerpc/mm/kasan/book3s_32.c
> > b/arch/powerpc/mm/kasan/book3s_32.c
> > index 9954b7a3b7ae..e04f21908c6a 100644
> > --- a/arch/powerpc/mm/kasan/book3s_32.c
> > +++ b/arch/powerpc/mm/kasan/book3s_32.c
> > @@ -38,7 +38,7 @@ int __init kasan_init_region(void *start, size_t size)
> > 
> >    	if (k_nobat < k_end) {
> >    		phys = memblock_phys_alloc_range(k_end - k_nobat, PAGE_SIZE, 0,
> > -						 MEMBLOCK_ALLOC_ANYWHERE);
> > +						 MEMBLOCK_ALLOC_ACCESSIBLE);
> >    		if (!phys)
> >    			return -ENOMEM;
> >    	}
> > 
> > And also that one:
> > 
> > 
> > diff --git a/arch/powerpc/mm/kasan/init_32.c
> > b/arch/powerpc/mm/kasan/init_32.c
> > index a70828a6d935..bc1c075489f4 100644
> > --- a/arch/powerpc/mm/kasan/init_32.c
> > +++ b/arch/powerpc/mm/kasan/init_32.c
> > @@ -84,6 +84,9 @@ kasan_update_early_region(unsigned long k_start,
> > unsigned long k_end, pte_t pte)
> >    {
> >    	unsigned long k_cur;
> > 
> > +	if (k_start == k_end)
> > +		return;
> > +
> >    	for (k_cur = k_start; k_cur != k_end; k_cur += PAGE_SIZE) {
> >    		pmd_t *pmd = pmd_off_k(k_cur);
> >    		pte_t *ptep = pte_offset_kernel(pmd, k_cur);
> > 
> > 
> >   
> 
> I tested the two vmlinux you sent me offlist, they both start without 
> problem on QEMU.

For me no problems show up on QEMU either. But QEMU does not seem able to mimic my G4 DPs configuration. That would be a dual CPU G4 + SMP config.

> So lets forget that for the moment, allthought you may try with 
> CONFIG_STRICT_KERNEL_RWX, in that case you should have enough BATs.

CONFIG_STRICT_KERNEL_RWX=y was enabled all along on my kernel .config. But for comparison I disabled it. If I disable STRICT_KERNEL_RWX I get no output about BATs whatsoever. Details below.

> In your last mail you say you tried with all patches. Did it include the 
> two above changes ?
> 
> If not can you perform the tests with those two changes in addition, 
> first one by one then both together depending on the result ?

I think I did apply both but I re-did the checks just to be sure. For my 'all patches applied' config please check the attached git diff.

dmesg with patch 1 "MEMBLOCK_ALLOC_ACCESSIBLE);" applied:

printk: bootconsole [udbg0] enabled
Total memory = 2048MB; using 4096kB for hash table
mapin_ram:125
mmu_mapin_ram:170 0 30000000 1400000 2000000
__mmu_mapin_ram:147 0 1400000
__mmu_mapin_ram:156 1400000
__mmu_mapin_ram:147 1400000 30000000
NO FREE BAT (8)
__mmu_mapin_ram:156 20000000
__mapin_ram_chunk:107 20000000 30000000
__mapin_ram_chunk:117
mapin_ram:134
kasan_mmu_init:129
kasan_mmu_init:132 0
kasan_mmu_init:137
ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead
Linux version 6.6.0-rc1-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #23 SMP Thu Sep 14 13:05:23 CEST 2023
kasan_init_region: c0000000 30000000 f8000000 fe000000
NO FREE BAT (8)
kasan_init_region: loop f8000000 fe000000

dmesg with patch 2 "if (k_start == k_end) return;" applied:

printk: bootconsole [udbg0] enabled
Total memory = 2048MB; using 4096kB for hash table
mapin_ram:125
mmu_mapin_ram:170 0 30000000 1400000 2000000
__mmu_mapin_ram:147 0 1400000
__mmu_mapin_ram:156 1400000
__mmu_mapin_ram:147 1400000 30000000
NO FREE BAT (8)
__mmu_mapin_ram:156 20000000
__mapin_ram_chunk:107 20000000 30000000
__mapin_ram_chunk:117
mapin_ram:134
kasan_mmu_init:132
kasan_mmu_init:135 0
kasan_mmu_init:140
ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead
Linux version 6.6.0-rc1-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #23 SMP Thu Sep 14 13:05:23 CEST 2023
kasan_init_region: c0000000 30000000 f8000000 fe000000
NO FREE BAT (8)
kasan_init_region: loop f8000000 fe000000

dmesg with both KASAN patches applied:

printk: bootconsole [udbg0] enabled
Total memory = 2048MB; using 4096kB for hash table
mapin_ram:125
mmu_mapin_ram:170 0 30000000 1400000 2000000
__mmu_mapin_ram:147 0 1400000
__mmu_mapin_ram:156 1400000
__mmu_mapin_ram:147 1400000 30000000
NO FREE BAT (8)
__mmu_mapin_ram:156 20000000
__mapin_ram_chunk:107 20000000 30000000
__mapin_ram_chunk:117
mapin_ram:134
kasan_mmu_init:132
kasan_mmu_init:135 0
kasan_mmu_init:140
ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead
Linux version 6.6.0-rc1-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #23 SMP Thu Sep 14 13:05:23 CEST 2023
kasan_init_region: c0000000 30000000 f8000000 fe000000
NO FREE BAT (8)
kasan_init_region: loop f8000000 fe000000

dmesg with both KASAN patches and STRICT_KERNEL_RWX=n applied:

printk: bootconsole [udbg0] enabled
Total memory = 2048MB; using 4096kB for hash table
mapin_ram:125
mmu_mapin_ram:170 0 30000000 1400000 2000000
__mmu_mapin_ram:147 0 1400000
__mmu_mapin_ram:156 1400000
__mmu_mapin_ram:147 1400000 30000000
__mmu_mapin_ram:156 20000000
__mapin_ram_chunk:107 20000000 30000000
__mapin_ram_chunk:117
mapin_ram:134
kasan_mmu_init:132
kasan_mmu_init:135 0
kasan_mmu_init:140

> Many thanks for your help and perseverance
> Christophe

You're welcome! Same to you! :)

Regards,
Erhard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: all_patches.patch
Type: text/x-patch
Size: 7187 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20230914/a06ba92d/attachment-0001.bin>


More information about the Linuxppc-dev mailing list