From mpjohn at us.ibm.com Wed Dec 1 08:34:29 2004 From: mpjohn at us.ibm.com (Maynard Johnson) Date: Tue, 30 Nov 2004 15:34:29 -0600 Subject: power4 performance counters Message-ID: > > Is there a publicly available document that specifies the events > countable on power4 performance counters? I found POWER4.evs on an AIX > box (part of pmapi library), but I would like to write my own code that > accesses those counters on linux, and incorporating that information > probably violates IBM copyright. > > Thanks, > Igor I've been told by those responsible for the PMU documentation for PowerPC that they are keenly aware of the need and that they will be working to fix this problem with publicly available doc, but I wouldn't expect that it will be very soon. In the meantime, it may interest you to know that my team has just completed a port of OProfile to ppc64, supporting Power4, 5, and 970. It's currently available at the OProfile website (http://oprofile.sourceforge.net/news/) in their CVS (not included in their latest release). This will be included in RHEL 4 and (probably) SLES 9 SP1. Additionally, we're starting work on porting PAPI (http://icl.cs.utk.edu/papi/index.html) to ppc64. If you still would like access to the currently confidential PMU doc, get with me offline from this mailing list and I can probably hook you up with the right resources. Maynard Johnson LTC Power Linux Toolchain 507-253-2650 From igor at cs.wisc.edu Fri Dec 3 03:59:49 2004 From: igor at cs.wisc.edu (Igor Grobman) Date: Thu, 2 Dec 2004 10:59:49 -0600 (CST) Subject: power4 performance counters In-Reply-To: <20041129214914.GD17540@krispykreme.ozlabs.ibm.com> References: <20041129214914.GD17540@krispykreme.ozlabs.ibm.com> Message-ID: Anton, Thanks for the oprofile and 970 Book IV pointers. They proved very useful. On Tue, 30 Nov 2004, Anton Blanchard wrote: > > > Let me know if you get stuck. Just out of interest, are you planning to > work on one of the performance counter packages (eg pmapi, perfctr) I am working on a kernel instrumentation/profiling tool called kerninst. I mentioned it on this list before. Basically, we provide a mechanism to place code snippets in any arbitrary point, while the kernel is running (no recompile/reboot is needed). I am looking to provide primitives for using performance counters inside the instrumentation. Hence, my question. See http://www.paradyn.org/html/kerninst.html. ppc64 release is coming real soon now. -Igor From miltonm at bga.com Fri Dec 3 16:46:24 2004 From: miltonm at bga.com (Milton Miller) Date: Thu, 2 Dec 2004 23:46:24 -0600 Subject: PPC64: EEH Recovery (Revised) Message-ID: Well, I was going to suggest you allocate the bars in the cnode that you use to save the device tree with, in the function eeh_save_bars. Then I realized that eeh_save_bars doesn't do anything except walk the tree allocating nodes, finding matching pci devices to read and copy pci_dev-> is_bridge. The actuall save was done at eeh_late_init. Which means eeh_save_bars is misnamed. You walk the list in recursive decent, creating a node each time you go. You restore the bars in the same recursive decent order, only instead of walking device_node you walk the tree that you saved. Why can you not just alloate a struct list and a "is_bridge" field, do the same walk adding to the end of the list in save then restore by walking the list forwards with list_for_each_safe ? See, the restore code just became linear. milton From schwab at suse.de Sun Dec 5 03:34:45 2004 From: schwab at suse.de (Andreas Schwab) Date: Sat, 04 Dec 2004 17:34:45 +0100 Subject: 2.6.10-rc3 does not boot on PowerMac Message-ID: The 2.6.10-rc3 kernel does not boot on PowerMac G5, the last thing I see is "smp_core99_kick_cpu done". Any idea where I should start looking? Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From anton at samba.org Sun Dec 5 05:20:43 2004 From: anton at samba.org (Anton Blanchard) Date: Sun, 5 Dec 2004 05:20:43 +1100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: References: Message-ID: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> > The 2.6.10-rc3 kernel does not boot on PowerMac G5, the last thing I see > is "smp_core99_kick_cpu done". Any idea where I should start looking? At a guess, does backing out this patch help? Anton -- From: Olof Johansson Below patch changes the early CPU spinup code to be based on physical CPU ID instead of logical. This will make it possible to kexec off of a different cpu than 0, for example after it's been hot-unplugged. The booted cpu will still be mapped as logical cpu 0, since there's various stuff in the early boot that assumes logical boot cpuid is 0. Also, it expands the kexec boot param structure to allow the booted physical cpuid to be passed in. This includes bumping the version number to 2 for backwards compat. Signed-off-by: Olof Johansson Signed-off-by: Anton Blanchard diff -puN arch/ppc64/kernel/asm-offsets.c~boot-cpuid arch/ppc64/kernel/asm-offsets.c --- linux-2.5/arch/ppc64/kernel/asm-offsets.c~boot-cpuid 2004-11-16 12:41:26.546908234 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/asm-offsets.c 2004-11-16 13:24:49.372405523 -0600 @@ -103,6 +103,7 @@ int main(void) DEFINE(PACA_EXDSI, offsetof(struct paca_struct, exdsi)); DEFINE(PACAEMERGSP, offsetof(struct paca_struct, emergency_sp)); DEFINE(PACALPPACA, offsetof(struct paca_struct, lppaca)); + DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id)); DEFINE(LPPACASRR0, offsetof(struct ItLpPaca, xSavedSrr0)); DEFINE(LPPACASRR1, offsetof(struct ItLpPaca, xSavedSrr1)); DEFINE(LPPACAANYINT, offsetof(struct ItLpPaca, xIntDword.xAnyInt)); diff -puN arch/ppc64/kernel/head.S~boot-cpuid arch/ppc64/kernel/head.S --- linux-2.5/arch/ppc64/kernel/head.S~boot-cpuid 2004-11-16 12:41:26.548908679 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/head.S 2004-11-16 13:20:02.404741718 -0600 @@ -26,6 +26,7 @@ #define SECONDARY_PROCESSORS #include +#include #include #include #include @@ -1192,7 +1193,7 @@ unrecov_slb: /* * On pSeries, secondary processors spin in the following code. - * At entry, r3 = this processor's number (in Linux terms, not hardware). + * At entry, r3 = this processor's number (physical cpu id) */ _GLOBAL(pseries_secondary_smp_init) mr r24,r3 @@ -1204,13 +1205,27 @@ _GLOBAL(pseries_secondary_smp_init) /* Copy some CPU settings from CPU 0 */ bl .__restore_cpu_setup - /* Set up a paca value for this processor. */ - LOADADDR(r5, paca) /* Get base vaddr of paca array */ - mulli r13,r24,PACA_SIZE /* Calculate vaddr of right paca */ - add r13,r13,r5 /* for this processor. */ - mtspr SPRG3,r13 /* Save vaddr of paca in SPRG3 */ -1: - HMT_LOW + /* Set up a paca value for this processor. Since we have the + * physical cpu id in r3, we need to search the pacas to find + * which logical id maps to our physical one. + */ + LOADADDR(r13, paca) /* Get base vaddr of paca array */ + li r5,0 /* logical cpu id */ +1: lhz r6,PACAHWCPUID(r13) /* Load HW procid from paca */ + cmpw r6,r24 /* Compare to our id */ + beq 2f + addi r13,r13,PACA_SIZE /* Loop to next PACA on miss */ + addi r5,r5,1 + cmpwi r5,NR_CPUS + blt 1b + +99: HMT_LOW /* Couldn't find our CPU id */ + b 99b + +2: mtspr SPRG3,r13 /* Save vaddr of paca in SPRG3 */ + /* From now on, r24 is expected to be logica cpuid */ + mr r24,r5 +3: HMT_LOW lbz r23,PACAPROCSTART(r13) /* Test if this processor should */ /* start. */ sync @@ -1225,7 +1240,7 @@ _GLOBAL(pseries_secondary_smp_init) bne .__secondary_start #endif #endif - b 1b /* Loop until told to go */ + b 3b /* Loop until told to go */ #ifdef CONFIG_PPC_ISERIES _STATIC(__start_initialization_iSeries) /* Clear out the BSS */ @@ -1921,19 +1936,6 @@ _STATIC(start_here_multiplatform) bl .__save_cpu_setup sync -#ifdef CONFIG_SMP - /* All secondary cpus are now spinning on a common - * spinloop, release them all now so they can start - * to spin on their individual paca spinloops. - * For non SMP kernels, the secondary cpus never - * get out of the common spinloop. - */ - li r3,1 - LOADADDR(r5,__secondary_hold_spinloop) - tophys(r4,r5) - std r3,0(r4) -#endif - /* Setup a valid physical PACA pointer in SPRG3 for early_setup * note that boot_cpuid can always be 0 nowadays since there is * nowhere it can be initialized differently before we reach this @@ -2131,6 +2133,22 @@ _GLOBAL(hmt_start_secondary) blr #endif +#if defined(CONFIG_SMP) && !defined(CONFIG_PPC_ISERIES) +_GLOBAL(smp_release_cpus) + /* All secondary cpus are spinning on a common + * spinloop, release them all now so they can start + * to spin on their individual paca spinloops. + * For non SMP kernels, the secondary cpus never + * get out of the common spinloop. + */ + li r3,1 + LOADADDR(r5,__secondary_hold_spinloop) + std r3,0(r5) + sync + blr +#endif /* CONFIG_SMP && !CONFIG_PPC_ISERIES */ + + /* * We put a few things here that have to be page-aligned. * This stuff goes at the beginning of the data segment, diff -puN arch/ppc64/kernel/pacaData.c~boot-cpuid arch/ppc64/kernel/pacaData.c --- linux-2.5/arch/ppc64/kernel/pacaData.c~boot-cpuid 2004-11-16 12:41:26.551909346 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pacaData.c 2004-11-16 12:41:26.572914016 -0600 @@ -58,6 +58,7 @@ extern unsigned long __toc_start; .stab_real = (asrr), /* Real pointer to segment table */ \ .stab_addr = (asrv), /* Virt pointer to segment table */ \ .cpu_start = (start), /* Processor start */ \ + .hw_cpu_id = 0xffff, \ .lppaca = { \ .xDesc = 0xd397d781, /* "LpPa" */ \ .xSize = sizeof(struct ItLpPaca), \ diff -puN arch/ppc64/kernel/prom.c~boot-cpuid arch/ppc64/kernel/prom.c --- linux-2.5/arch/ppc64/kernel/prom.c~boot-cpuid 2004-11-16 12:41:26.554910013 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/prom.c 2004-11-16 12:41:26.573914239 -0600 @@ -853,10 +853,19 @@ static int __init early_init_dt_scan_cpu } } - /* Check if it's the boot-cpu, set it's hw index in paca now */ - if (get_flat_dt_prop(node, "linux,boot-cpu", NULL) != NULL) { - u32 *prop = get_flat_dt_prop(node, "reg", NULL); - paca[0].hw_cpu_id = prop == NULL ? 0 : *prop; + if (initial_boot_params && initial_boot_params->version >= 2) { + /* version 2 of the kexec param format adds the phys cpuid + * of booted proc. + */ + boot_cpuid_phys = initial_boot_params->boot_cpuid_phys; + boot_cpuid = 0; + } else { + /* Check if it's the boot-cpu, set it's hw index in paca now */ + if (get_flat_dt_prop(node, "linux,boot-cpu", NULL) != NULL) { + u32 *prop = get_flat_dt_prop(node, "reg", NULL); + set_hard_smp_processor_id(0, prop == NULL ? 0 : *prop); + boot_cpuid_phys = get_hard_smp_processor_id(0); + } } return 0; diff -puN arch/ppc64/kernel/prom_init.c~boot-cpuid arch/ppc64/kernel/prom_init.c --- linux-2.5/arch/ppc64/kernel/prom_init.c~boot-cpuid 2004-11-16 12:41:26.556910458 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/prom_init.c 2004-11-16 12:41:26.575914683 -0600 @@ -992,13 +992,13 @@ static void __init prom_hold_cpus(void) /* Primary Thread of non-boot cpu */ prom_printf("%x : starting cpu hw idx %x... ", cpuid, reg); call_prom("start-cpu", 3, 0, node, - secondary_hold, cpuid); + secondary_hold, reg); for ( i = 0 ; (i < 100000000) && (*acknowledge == ((unsigned long)-1)); i++ ) mb(); - if (*acknowledge == cpuid) { + if (*acknowledge == reg) { prom_printf("done\n"); /* We have to get every CPU out of OF, * even if we never start it. */ diff -puN arch/ppc64/kernel/setup.c~boot-cpuid arch/ppc64/kernel/setup.c --- linux-2.5/arch/ppc64/kernel/setup.c~boot-cpuid 2004-11-16 12:41:26.559911125 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/setup.c 2004-11-16 13:22:53.060669846 -0600 @@ -99,6 +99,8 @@ extern void htab_initialize(void); extern void early_init_devtree(void *flat_dt); extern void unflatten_device_tree(void); +extern void smp_release_cpus(void); + unsigned long decr_overclock = 1; unsigned long decr_overclock_proc0 = 1; unsigned long decr_overclock_set = 0; @@ -106,6 +108,7 @@ unsigned long decr_overclock_proc0_set = int have_of = 1; int boot_cpuid = 0; +int boot_cpuid_phys = 0; dev_t boot_dev; /* @@ -242,6 +245,7 @@ static void __init setup_cpu_maps(void) { struct device_node *dn = NULL; int cpu = 0; + int swap_cpuid = 0; check_smt_enabled(); @@ -266,11 +270,23 @@ static void __init setup_cpu_maps(void) cpu_set(cpu, cpu_present_map); set_hard_smp_processor_id(cpu, intserv[j]); } + if (intserv[j] == boot_cpuid_phys) + swap_cpuid = cpu; cpu_set(cpu, cpu_possible_map); cpu++; } } + /* Swap CPU id 0 with boot_cpuid_phys, so we can always assume that + * boot cpu is logical 0. + */ + if (boot_cpuid_phys != get_hard_smp_processor_id(0)) { + u32 tmp; + tmp = get_hard_smp_processor_id(0); + set_hard_smp_processor_id(0, boot_cpuid_phys); + set_hard_smp_processor_id(swap_cpuid, tmp); + } + /* * On pSeries LPAR, we need to know how many cpus * could possibly be added to this partition. @@ -630,6 +646,11 @@ void __init setup_system(void) * iSeries has already initialized the cpu maps at this point. */ setup_cpu_maps(); + + /* Release secondary cpus out of their spinloops at 0x60 now that + * we can map physical -> logical CPU ids + */ + smp_release_cpus(); #endif /* defined(CONFIG_SMP) && !defined(CONFIG_PPC_ISERIES) */ printk("Starting Linux PPC64 %s\n", UTS_RELEASE); diff -puN include/asm-ppc64/prom.h~boot-cpuid include/asm-ppc64/prom.h --- linux-2.5/include/asm-ppc64/prom.h~boot-cpuid 2004-11-16 12:41:26.561911570 -0600 +++ linux-2.5-olof/include/asm-ppc64/prom.h 2004-11-16 12:41:26.577915128 -0600 @@ -56,6 +56,8 @@ struct boot_param_header u32 off_mem_rsvmap; /* offset to memory reserve map */ u32 version; /* format version */ u32 last_comp_version; /* last compatible version */ + /* version 2 fields below */ + u32 boot_cpuid_phys; /* Which physical CPU id we're booting on */ }; diff -puN include/asm-ppc64/smp.h~boot-cpuid include/asm-ppc64/smp.h --- linux-2.5/include/asm-ppc64/smp.h~boot-cpuid 2004-11-16 12:41:26.564912237 -0600 +++ linux-2.5-olof/include/asm-ppc64/smp.h 2004-11-16 12:41:26.577915128 -0600 @@ -27,6 +27,7 @@ #include extern int boot_cpuid; +extern int boot_cpuid_phys; extern void cpu_die(void) __attribute__((noreturn)); _ From schwab at suse.de Sun Dec 5 08:24:48 2004 From: schwab at suse.de (Andreas Schwab) Date: Sat, 04 Dec 2004 22:24:48 +0100 Subject: 2.6.10-rc3 does not boot on PowerMac References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> Message-ID: Anton Blanchard writes: >> The 2.6.10-rc3 kernel does not boot on PowerMac G5, the last thing I see >> is "smp_core99_kick_cpu done". Any idea where I should start looking? > > At a guess, does backing out this patch help? It's actually something completely different. The system boots all right, but there is no framebuffer console at all (X is working fine). The problem seems to be the recent pci cleanups. The offb driver wants to reserve 98004000-98183fff, but there is no matching pci resource. 2.6.9: 90000000-9fffffff : /pci at 0,f0000000 90000000-9001ffff : 0000:f0:10.0 91000000-91ffffff : 0000:f0:10.0 98000000-9fffffff : 0000:f0:10.0 98004000-98183fff : offb 2.6.10-rc3: 90000000-9fffffff : /pci at 0,f0000000 90000000-97ffffff : 0000:f0:10.0 98000000-9801ffff : 0000:f0:10.0 Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Sun Dec 5 15:23:27 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 05 Dec 2004 15:23:27 +1100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> Message-ID: <1102220607.18808.7.camel@gaston> On Sat, 2004-12-04 at 22:24 +0100, Andreas Schwab wrote: > Anton Blanchard writes: > > >> The 2.6.10-rc3 kernel does not boot on PowerMac G5, the last thing I see > >> is "smp_core99_kick_cpu done". Any idea where I should start looking? > > > > At a guess, does backing out this patch help? > > It's actually something completely different. The system boots all right, > but there is no framebuffer console at all (X is working fine). The > problem seems to be the recent pci cleanups. The offb driver wants to > reserve 98004000-98183fff, but there is no matching pci resource. > > 2.6.9: > 90000000-9fffffff : /pci at 0,f0000000 > 90000000-9001ffff : 0000:f0:10.0 > 91000000-91ffffff : 0000:f0:10.0 > 98000000-9fffffff : 0000:f0:10.0 > 98004000-98183fff : offb > > 2.6.10-rc3: > 90000000-9fffffff : /pci at 0,f0000000 > 90000000-97ffffff : 0000:f0:10.0 > 98000000-9801ffff : 0000:f0:10.0 That is very strange ... it looks like the video card is beeing remapped to a different location than where it was initially. Can you send me the dmesg log ? Is this an nvidia or an ATI card ? Does either rivafb or radeonfb work ? Ben. From schwab at suse.de Mon Dec 6 02:56:59 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 05 Dec 2004 16:56:59 +0100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: <1102220607.18808.7.camel@gaston> (Benjamin Herrenschmidt's message of "Sun, 05 Dec 2004 15:23:27 +1100") References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102220607.18808.7.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > That is very strange ... it looks like the video card is beeing remapped > to a different location than where it was initially. Can you send me the > dmesg log ? Is this an nvidia or an ATI card ? Does either rivafb or > radeonfb work ? It's an nvidia card, but it is not supported by rivafb (pci id 10de:0321). I have attached /proc/iomem and dmesg output for both 2.6.9 and 2.6.10-rc3. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dmesg-2.6.9 Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041205/767be477/attachment.txt -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dmesg-2.6.10-rc3 Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041205/767be477/attachment-0001.txt -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: iomem-2.6.9 Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041205/767be477/attachment-0002.txt -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: iomem-2.6.10-rc3 Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041205/767be477/attachment-0003.txt From schwab at suse.de Mon Dec 6 03:36:52 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 05 Dec 2004 17:36:52 +0100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: (Andreas Schwab's message of "Sun, 05 Dec 2004 16:56:59 +0100") References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102220607.18808.7.camel@gaston> Message-ID: Andreas Schwab writes: > Benjamin Herrenschmidt writes: > >> That is very strange ... it looks like the video card is beeing remapped >> to a different location than where it was initially. Can you send me the >> dmesg log ? Is this an nvidia or an ATI card ? Does either rivafb or >> radeonfb work ? > > It's an nvidia card, but it is not supported by rivafb (pci id 10de:0321). When I add this id to the list in riva/fbdev.c I get a mangled display, but at least the driver finds the framebuffer memory. 90000000-97ffffff : rivafb f1000000-f1ffffff : rivafb rivafb: nVidia device/chipset 10DE0321 rivafb: Detected CRTC controller 0 being used rivafb: disabling acceleration rivafb: setting virtual Y resolution to 52428 Console: switching to colour frame buffer device 160x64 rivafb: PCI nVidia NV32 framebuffer ver 0.9.5b (64MB @ 0x90000000) Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From tmbinc at elitedvb.net Mon Dec 6 05:29:21 2004 From: tmbinc at elitedvb.net (Felix Domke) Date: Sun, 05 Dec 2004 19:29:21 +0100 Subject: G5 iMac .config Message-ID: <41B35381.3080008@elitedvb.net> I tried to compile a kernel with the iMac-patches posted by J. Mayer, but my kernel doesn't boot. My kernel is 2.6.10-rc2, the patches applied cleanly. The kernel from the gentoo livecd ("imacg5") works well, but i can't find the corresponding .config file anywhere. The last message is "DO-QUIESCE finishedreturning from prom_init". I guess my kernel is just not configured in the right way. Can anybody send me a working .config to work on from or tell me what important things there are to check for? thanks, Felix From benh at kernel.crashing.org Mon Dec 6 07:44:07 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 06 Dec 2004 07:44:07 +1100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102220607.18808.7.camel@gaston> Message-ID: <1102279447.11851.73.camel@gaston> On Sun, 2004-12-05 at 16:56 +0100, Andreas Schwab wrote: > Benjamin Herrenschmidt writes: > > > That is very strange ... it looks like the video card is beeing remapped > > to a different location than where it was initially. Can you send me the > > dmesg log ? Is this an nvidia or an ATI card ? Does either rivafb or > > radeonfb work ? > > It's an nvidia card, but it is not supported by rivafb (pci id 10de:0321). > I have attached /proc/iomem and dmesg output for both 2.6.9 and 2.6.10-rc3. The dmesg doesn't contain anything useful apparently. For some reason, the nVidia card is getting moved around by the PCI code it seems (lspci -vv would probably show you that it gets re-assigned a different location). It's absolutely bad if the kernel starts doing that to cards that are already located to valid addresses. The MacIO (K2) chip itself for example is a PCI device and must _never_ be moved around or the kernel will die right away. It would be useful to enable some PCI debugging to figure out what's going on there. Ben. From tiwari.amit at gmail.com Mon Dec 6 23:59:04 2004 From: tiwari.amit at gmail.com (Amit K Tiwari) Date: Mon, 6 Dec 2004 18:29:04 +0530 Subject: User allocated memory for DMA Message-ID: I am writing a driver for a PCIX board to run on Y-HPC (64-bit Yellow Dog Linux pre-installed on Apple XServe). The application is to process the data acquired from the board and send the results out. The data needs to be acquired by DMA into a user allocated buffer. The amount of data is huge and I need about 1.5 GB per DMA operation. Here is what I am doing: 1. malloc and mlock the memory in user mode. 2. get the user pages (get_user_pages) in the driver. 3. Find out how much physically contiguous memory did the application get (Pass through all the pages and see if the physical addresses got from page_to_phys are contiguous). 4. Prepare a scatter-gather list of contiguous regions. 5. pci_map_sg the sg list to get the DMA addresses for each of the entries. My problem is that at step 3 I do not get any contiguous region. Surprisingly, some of the physical addresses are such that the low order page (say page 151) has high order physical address (say, 1ef234000) while the next high order page (say page 152) has the next lower physical address(say 1ef233000) and this trend continues for some 10 pages after which the addresses totally far apart. As a result of this, my scatter gather list has as many entries as there are pages allocated. Am I doing something bad here? I know there is limit on memory that can be allocated through pci_alloc_*. Is there any other way in which I can allocate memory for DMA? Thanks, Amit K T From grave at ipno.in2p3.fr Tue Dec 7 00:13:42 2004 From: grave at ipno.in2p3.fr (grave) Date: Mon, 06 Dec 2004 13:13:42 +0000 Subject: =?iso-8859-1?q?Re=A0=3A?= User allocated memory for DMA In-Reply-To: (from tiwari.amit@gmail.com on Mon Dec 6 13:59:04 2004) References: Message-ID: <1102338822l.12013l.2l@ipnnarval> Did you check the bigphysmem area in the kernel, as I remember it allow you to reserve a certain amount of memory out of linux... xavier Le 06.12.2004 13:59:04, Amit K Tiwari a ?crit?: > I am writing a driver for a PCIX board to run on Y-HPC (64-bit Yellow > Dog Linux pre-installed on Apple XServe). The application is to > process the data acquired from the board and send the results out. > The > data needs to be acquired by DMA into a user allocated buffer. The > amount of data is huge and I need about 1.5 GB per DMA operation. > > Here is what I am doing: > 1. malloc and mlock the memory in user mode. > 2. get the user pages (get_user_pages) in the driver. > 3. Find out how much physically contiguous memory did the application > get (Pass through all the pages and see if the physical addresses got > from page_to_phys are contiguous). > 4. Prepare a scatter-gather list of contiguous regions. > 5. pci_map_sg the sg list to get the DMA addresses for each of the > entries. > > My problem is that at step 3 I do not get any contiguous region. > Surprisingly, some of the physical addresses are such that the low > order page (say page 151) has high order physical address (say, > 1ef234000) while the next high order page (say page 152) has the next > lower physical address(say 1ef233000) and this trend continues for > some 10 pages after which the addresses totally far apart. > > As a result of this, my scatter gather list has as many entries > as there are pages allocated. > > Am I doing something bad here? > I know there is limit on memory that can be allocated through > pci_alloc_*. Is there any other way in which I can allocate memory > for > DMA? > > Thanks, > Amit K T > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev > > From jdubois at mc.com Tue Dec 7 00:40:25 2004 From: jdubois at mc.com (Jean-Christophe Dubois) Date: Mon, 06 Dec 2004 14:40:25 +0100 Subject: User allocated memory for DMA In-Reply-To: References: Message-ID: <1102340425.27838.32.camel@fr-jdubois1.ad.mc.com> On Mon, 2004-12-06 at 18:29 +0530, Amit K Tiwari wrote: > I am writing a driver for a PCIX board to run on Y-HPC (64-bit Yellow > Dog Linux pre-installed on Apple XServe). The application is to > process the data acquired from the board and send the results out. The > data needs to be acquired by DMA into a user allocated buffer. The > amount of data is huge and I need about 1.5 GB per DMA operation. > > Here is what I am doing: > 1. malloc and mlock the memory in user mode. Just curious, but is all the G5 memory DMAable? And is your PCIX card DMA engine 64 bits capable? If yes maybe you should consider an approach similar to the bigphysarea patch. This way you could be sure to get one single contiguous block the size you want. As a drawback it may mean that a lot of memory will not be available to Linux but that may be a price you are ready to pay for your application as it will consume this much memory anyway. If required it is trivial to implement some kind of filesystem on top of it to manage this memory region from user space (we have done something like this here even if we didn't need such big chunk of memory). It is a design rather different from the scatter/gather list but it might be more efficient for you overall. > 2. get the user pages (get_user_pages) in the driver. > 3. Find out how much physically contiguous memory did the application > get (Pass through all the pages and see if the physical addresses got > from page_to_phys are contiguous). > 4. Prepare a scatter-gather list of contiguous regions. > 5. pci_map_sg the sg list to get the DMA addresses for each of the entries. > > My problem is that at step 3 I do not get any contiguous region. > Surprisingly, some of the physical addresses are such that the low > order page (say page 151) has high order physical address (say, > 1ef234000) while the next high order page (say page 152) has the next > lower physical address(say 1ef233000) and this trend continues for > some 10 pages after which the addresses totally far apart. > > As a result of this, my scatter gather list has as many entries > as there are pages allocated. > > Am I doing something bad here? > I know there is limit on memory that can be allocated through > pci_alloc_*. Is there any other way in which I can allocate memory for > DMA? > > Thanks, > Amit K T > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev From olof at austin.ibm.com Tue Dec 7 01:21:59 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Mon, 06 Dec 2004 08:21:59 -0600 Subject: User allocated memory for DMA In-Reply-To: References: Message-ID: <41B46B07.4060504@austin.ibm.com> Amit K Tiwari wrote: >I am writing a driver for a PCIX board to run on Y-HPC (64-bit Yellow >Dog Linux pre-installed on Apple XServe). The application is to >process the data acquired from the board and send the results out. The >data needs to be acquired by DMA into a user allocated buffer. The >amount of data is huge and I need about 1.5 GB per DMA operation. > > The current IOMMU isn't quite written with the expectation that the user will keep a 1.5GB mapping of DMA memory active for an extensive period of time. It should work, but let me know if you see any strangeness. >Here is what I am doing: >1. malloc and mlock the memory in user mode. >2. get the user pages (get_user_pages) in the driver. >3. Find out how much physically contiguous memory did the application >get (Pass through all the pages and see if the physical addresses got >from page_to_phys are contiguous). >4. Prepare a scatter-gather list of contiguous regions. >5. pci_map_sg the sg list to get the DMA addresses for each of the entries. > >My problem is that at step 3 I do not get any contiguous region. >Surprisingly, some of the physical addresses are such that the low >order page (say page 151) has high order physical address (say, >1ef234000) while the next high order page (say page 152) has the next >lower physical address(say 1ef233000) and this trend continues for >some 10 pages after which the addresses totally far apart. > > This doesn't really surprise me. You have no guarantees where in physical memory that a process will get its pages allocated, they can be anywhere. The nice thing is that with the IOMMU, you can get a mostly contigous address range as seen by the PCI adapter. The translation between the two is transparent to the DMA operation. >As a result of this, my scatter gather list has as many entries >as there are pages allocated. > >Am I doing something bad here? >I know there is limit on memory that can be allocated through >pci_alloc_*. Is there any other way in which I can allocate memory for >DMA? > > Have you considered making your application use large pages? If so, you'd at least get 16MB contigous at a time. -Olof From olof at austin.ibm.com Tue Dec 7 01:25:46 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Mon, 06 Dec 2004 08:25:46 -0600 Subject: User allocated memory for DMA In-Reply-To: <1102340425.27838.32.camel@fr-jdubois1.ad.mc.com> References: <1102340425.27838.32.camel@fr-jdubois1.ad.mc.com> Message-ID: <41B46BEA.5080805@austin.ibm.com> Jean-Christophe Dubois wrote: >Just curious, but is all the G5 memory DMAable? And is your PCIX card >DMA engine 64 bits capable? > All memory on the G5 can be used for DMA, but only via the IOMMU. For G5s with less than 2GB of memory, the IOMMU is per default disabled since it's not needed. There's no hardware support for 64-bit DMA on G5s, so it unfortunately doesn't matter what the card supports. -Olof From will_schmidt at vnet.ibm.com Tue Dec 7 04:16:25 2004 From: will_schmidt at vnet.ibm.com (will schmidt) Date: Mon, 06 Dec 2004 11:16:25 -0600 Subject: [PPC64] Patch to correct alignment for lppaca in paca_struct Message-ID: <41B493E9.7060004@vnet.ibm.com> Hi, We found that we were failing register_vpa calls in cases where the lppaca structure (part of the PACA) crosses a page boundary. (LTC bug 12689). This was causing us (lparcfg specifically) some grief as the xSharedProc bit was not being set. The attached patch changes the alignment of the lppaca structure, and a few comments so we understand why. -Will Signed-off-by: Will Schmidt --- a/include/asm-ppc64/paca.h 2004-12-03 13:03:09.048520608 -0600 +++ b/include/asm-ppc64/paca.h 2004-12-03 13:18:17.433655752 -0600 @@ -99,11 +99,13 @@ u64 exdsi[8]; /* used for linear mapping hash table misses */ /* - * iSeries structues which the hypervisor knows about - Not - * sure if these particularly need to be cacheline aligned. + * iSeries structues which the hypervisor knows about - + * This structure should not cross a page boundary. + * The vpa_init/register_vpa call is now known to fail if the lppaca + * structure crosses a page boundary. * The lppaca is also used on POWER5 pSeries boxes. */ - struct ItLpPaca lppaca __attribute__((aligned(0x80))); + struct ItLpPaca lppaca __attribute__((aligned(0x400))); #ifdef CONFIG_PPC_ISERIES struct ItLpRegSave reg_save; #endif From haveblue at us.ibm.com Tue Dec 7 05:28:06 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Mon, 06 Dec 2004 10:28:06 -0800 Subject: [PPC64] Patch to correct alignment for lppaca in paca_struct In-Reply-To: <41B493E9.7060004@vnet.ibm.com> References: <41B493E9.7060004@vnet.ibm.com> Message-ID: <1102357686.5656.39.camel@localhost> On Mon, 2004-12-06 at 09:16, will schmidt wrote: > - struct ItLpPaca lppaca __attribute__((aligned(0x80))); > + struct ItLpPaca lppaca __attribute__((aligned(0x400))); Do you have guarantees about how large the lppaca can get? If it's larger than 0x400 bytes, you can still cross a page boundary, right? Maybe you should just align it to PAGE_SIZE and be done with it. -- Dave From david at gibson.dropbear.id.au Tue Dec 7 11:27:06 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 7 Dec 2004 11:27:06 +1100 Subject: [PPC64] Patch to correct alignment for lppaca in paca_struct In-Reply-To: <1102357686.5656.39.camel@localhost> References: <41B493E9.7060004@vnet.ibm.com> <1102357686.5656.39.camel@localhost> Message-ID: <20041207002706.GB9205@zax> On Mon, Dec 06, 2004 at 10:28:06AM -0800, Dave Hansen wrote: > On Mon, 2004-12-06 at 09:16, will schmidt wrote: > > - struct ItLpPaca lppaca __attribute__((aligned(0x80))); > > + struct ItLpPaca lppaca __attribute__((aligned(0x400))); > > Do you have guarantees about how large the lppaca can get? If it's > larger than 0x400 bytes, you can still cross a page boundary, right? > Maybe you should just align it to PAGE_SIZE and be done with it. The lppaca is part of the firmware ABI, so it can't just go randomly changing or growing. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From benh at kernel.crashing.org Tue Dec 7 18:09:48 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 07 Dec 2004 18:09:48 +1100 Subject: User allocated memory for DMA In-Reply-To: References: Message-ID: <1102403388.11502.8.camel@gaston> On Mon, 2004-12-06 at 18:29 +0530, Amit K Tiwari wrote: > My problem is that at step 3 I do not get any contiguous region. > Surprisingly, some of the physical addresses are such that the low > order page (say page 151) has high order physical address (say, > 1ef234000) while the next high order page (say page 152) has the next > lower physical address(say 1ef233000) and this trend continues for > some 10 pages after which the addresses totally far apart. What kernel version is this ? I remember something about the default allocation "direction" of pages beeing changed a while ago... Ben. From schwab at suse.de Wed Dec 8 11:49:11 2004 From: schwab at suse.de (Andreas Schwab) Date: Wed, 08 Dec 2004 01:49:11 +0100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: <1102464972.11516.46.camel@gaston> (Benjamin Herrenschmidt's message of "Wed, 08 Dec 2004 11:16:12 +1100") References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102456836.11502.33.camel@gaston> <1102463266.11516.39.camel@gaston> <1102464972.11516.46.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > pci_probe_only should be set to 0 by the pmac_pci.c code, thus causing > pci_assign_unassigned_resources() to actually be called... Not really. arch/ppc64/kernel/pci.c: unsigned long pci_probe_only = 1; arch/ppc64/kernel/pmac_pci.c: extern int pci_probe_only; This has been fixed in rc3 .... > - pci_assign_unassigned_resources() seem to not work properly, that is, > it moves around things that don't need to be moved thus causing your > problem when it's actually called. Probably has always been broken. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Wed Dec 8 10:47:46 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 08 Dec 2004 10:47:46 +1100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102456836.11502.33.camel@gaston> Message-ID: <1102463266.11516.39.camel@gaston> On Wed, 2004-12-08 at 00:26 +0100, Andreas Schwab wrote: > Benjamin Herrenschmidt writes: > > > Have you found out what causes the kernel PCI layer to re-allocate the > > video card resources ? I should definitely not do that ... > > Sorry, but I have no idea about all that stuff. Please advice! > > > Can you send me the lspci -vv (as root) output with both kernels ? > > Attached. > > > Also compile the PCI layer with DEBUG enabled ... > > I've enabled DEBUG in all files in drivers/pci and in > arch/ppc64/kernel/pci.c. If you need more please tell. I don't understand ... the 2 lspci's were taken with the same machine ? The PCI setup is totally different ... pretty much everything got moved around ! That is really bad. You don't seem to have debug enabled in drivers/pci/setup-bus.c tho, can you check if pci_assign_unassigned_resources() is called at all in either kernel ? Ben. From schwab at suse.de Wed Dec 8 11:07:01 2004 From: schwab at suse.de (Andreas Schwab) Date: Wed, 08 Dec 2004 01:07:01 +0100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: <1102463266.11516.39.camel@gaston> (Benjamin Herrenschmidt's message of "Wed, 08 Dec 2004 10:47:46 +1100") References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102456836.11502.33.camel@gaston> <1102463266.11516.39.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > I don't understand ... the 2 lspci's were taken with the same machine ? Sure. :-) > You don't seem to have debug enabled in drivers/pci/setup-bus.c tho, I have, it's the default. > can you check if pci_assign_unassigned_resources() is called at all in > either kernel ? It appears to be called in 2.6.10-rc3, but not in 2.6.9. PCI: Probing PCI hardware PCI: Bus 1, bridge: 0001:00:01.0 IO window: disabled. MEM window: 80000000-800fffff PREFETCH window: disabled. PCI: Bus 5, bridge: 0001:00:02.0 IO window: disabled. MEM window: 80100000-801fffff PREFETCH window: disabled. PCI: Bus 2, bridge: 0001:00:03.0 IO window: disabled. MEM window: 80200000-802fffff PREFETCH window: disabled. PCI: Bus 3, bridge: 0001:00:04.0 IO window: disabled. MEM window: 80300000-805fffff PREFETCH window: disabled. PCI: Bus 4, bridge: 0001:00:05.0 IO window: disabled. MEM window: 80600000-806fffff PREFETCH window: disabled. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Wed Dec 8 09:00:36 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 08 Dec 2004 09:00:36 +1100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> Message-ID: <1102456836.11502.33.camel@gaston> On Sat, 2004-12-04 at 22:24 +0100, Andreas Schwab wrote: > $ce. > > 2.6.9: > 90000000-9fffffff : /pci at 0,f0000000 > 90000000-9001ffff : 0000:f0:10.0 > 91000000-91ffffff : 0000:f0:10.0 > 98000000-9fffffff : 0000:f0:10.0 > 98004000-98183fff : offb > > 2.6.10-rc3: > 90000000-9fffffff : /pci at 0,f0000000 > 90000000-97ffffff : 0000:f0:10.0 > 98000000-9801ffff : 0000:f0:10.0 Have you found out what causes the kernel PCI layer to re-allocate the video card resources ? I should definitely not do that ... Can you send me the lspci -vv (as root) output with both kernels ? Also compile the PCI layer with DEBUG enabled ... Ben. From benh at kernel.crashing.org Wed Dec 8 11:54:43 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 08 Dec 2004 11:54:43 +1100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102456836.11502.33.camel@gaston> <1102463266.11516.39.camel@gaston> <1102464972.11516.46.camel@gaston> Message-ID: <1102467283.11502.49.camel@gaston> On Wed, 2004-12-08 at 01:49 +0100, Andreas Schwab wrote: > Benjamin Herrenschmidt writes: > > > pci_probe_only should be set to 0 by the pmac_pci.c code, thus causing > > pci_assign_unassigned_resources() to actually be called... > > Not really. > > arch/ppc64/kernel/pci.c: > unsigned long pci_probe_only = 1; > > arch/ppc64/kernel/pmac_pci.c: > extern int pci_probe_only; In my 2.6.9 snapshot here, in pmac_pci.c : /* Tell pci.c to use the common resource allocation mecanism */ pci_probe_only = 0; > This has been fixed in rc3 .... Hrm... what -rc3 ? > > - pci_assign_unassigned_resources() seem to not work properly, that is, > > it moves around things that don't need to be moved thus causing your > > problem when it's actually called. > > Probably has always been broken. Yes, but still needs to be fixed :) Ben. From benh at kernel.crashing.org Wed Dec 8 11:16:12 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 08 Dec 2004 11:16:12 +1100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102456836.11502.33.camel@gaston> <1102463266.11516.39.camel@gaston> Message-ID: <1102464972.11516.46.camel@gaston> On Wed, 2004-12-08 at 01:07 +0100, Andreas Schwab wrote: > Benjamin Herrenschmidt writes: > > > I don't understand ... the 2 lspci's were taken with the same machine ? > > Sure. :-) > > > You don't seem to have debug enabled in drivers/pci/setup-bus.c tho, > > I have, it's the default. > > > can you check if pci_assign_unassigned_resources() is called at all in > > either kernel ? > > It appears to be called in 2.6.10-rc3, but not in 2.6.9. I have no explanation for why it's not called in 2.6.9 ... Can you have a look at what's going on in 2.6.9 arch/ppc64/kernel/pci.c ? pci_probe_only should be set to 0 by the pmac_pci.c code, thus causing pci_assign_unassigned_resources() to actually be called... There are 2 different issues here: - pci_assign_unassigned_resources() should be called by both kernels, I want to understand why it's not by 2.6.9... - pci_assign_unassigned_resources() seem to not work properly, that is, it moves around things that don't need to be moved thus causing your problem when it's actually called. The later, I'll do a workaround by leaving pci_probe_only to 1 instead of 0 for 2.6.10, but we need to get that fixed in the long term... Ben. From schwab at suse.de Wed Dec 8 10:26:19 2004 From: schwab at suse.de (Andreas Schwab) Date: Wed, 08 Dec 2004 00:26:19 +0100 Subject: 2.6.10-rc3 does not boot on PowerMac References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102456836.11502.33.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > Have you found out what causes the kernel PCI layer to re-allocate the > video card resources ? I should definitely not do that ... Sorry, but I have no idea about all that stuff. Please advice! > Can you send me the lspci -vv (as root) output with both kernels ? Attached. > Also compile the PCI layer with DEBUG enabled ... I've enabled DEBUG in all files in drivers/pci and in arch/ppc64/kernel/pci.c. If you need more please tell. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lspci-2.6.9 Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041208/8ef8a47d/attachment.txt -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lspci-2.6.10-rc3 Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041208/8ef8a47d/attachment-0001.txt -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pci-debug Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041208/8ef8a47d/attachment-0002.txt From schwab at suse.de Wed Dec 8 12:04:03 2004 From: schwab at suse.de (Andreas Schwab) Date: Wed, 08 Dec 2004 02:04:03 +0100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: <1102467283.11502.49.camel@gaston> (Benjamin Herrenschmidt's message of "Wed, 08 Dec 2004 11:54:43 +1100") References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102456836.11502.33.camel@gaston> <1102463266.11516.39.camel@gaston> <1102464972.11516.46.camel@gaston> <1102467283.11502.49.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > On Wed, 2004-12-08 at 01:49 +0100, Andreas Schwab wrote: >> Benjamin Herrenschmidt writes: >> >> > pci_probe_only should be set to 0 by the pmac_pci.c code, thus causing >> > pci_assign_unassigned_resources() to actually be called... >> >> Not really. >> >> arch/ppc64/kernel/pci.c: >> unsigned long pci_probe_only = 1; >> >> arch/ppc64/kernel/pmac_pci.c: >> extern int pci_probe_only; > > In my 2.6.9 snapshot here, in pmac_pci.c : > > /* Tell pci.c to use the common resource allocation mecanism */ > pci_probe_only = 0; Which is a no-op in 2.6.9. Remember, we are 64 bit. >> This has been fixed in rc3 .... > > Hrm... what -rc3 ? See the subject. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Wed Dec 8 12:40:24 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 08 Dec 2004 12:40:24 +1100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102456836.11502.33.camel@gaston> <1102463266.11516.39.camel@gaston> <1102464972.11516.46.camel@gaston> <1102467283.11502.49.camel@gaston> Message-ID: <1102470024.11516.54.camel@gaston> On Wed, 2004-12-08 at 02:04 +0100, Andreas Schwab wrote: > > > > In my 2.6.9 snapshot here, in pmac_pci.c : > > > > /* Tell pci.c to use the common resource allocation mecanism */ > > pci_probe_only = 0; > > Which is a no-op in 2.6.9. Remember, we are 64 bit. #ifndef CONFIG_PPC_ISERIES if (pci_probe_only) pcibios_claim_of_setup(); else /* FIXME: `else' will be removed when pci_assign_unassigned_resources() is able to work correctly with [partially] allocated PCI tree. */ pci_assign_unassigned_resources(); #endif /* !CONFIG_PPC_ISERIES */ (in pcibios_init() in 2.6.9) Doesn't look like a no-op :) Ben. From anton at samba.org Wed Dec 8 13:02:22 2004 From: anton at samba.org (Anton Blanchard) Date: Wed, 8 Dec 2004 13:02:22 +1100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: <1102470024.11516.54.camel@gaston> References: <1102456836.11502.33.camel@gaston> <1102463266.11516.39.camel@gaston> <1102464972.11516.46.camel@gaston> <1102467283.11502.49.camel@gaston> <1102470024.11516.54.camel@gaston> Message-ID: <20041208020222.GB31411@krispykreme.ozlabs.ibm.com> > Doesn't look like a no-op :) Look closer, accessing something as an int when its a long :) Anton From benh at kernel.crashing.org Wed Dec 8 13:03:28 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 08 Dec 2004 13:03:28 +1100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: <20041208020222.GB31411@krispykreme.ozlabs.ibm.com> References: <1102456836.11502.33.camel@gaston> <1102463266.11516.39.camel@gaston> <1102464972.11516.46.camel@gaston> <1102467283.11502.49.camel@gaston> <1102470024.11516.54.camel@gaston> <20041208020222.GB31411@krispykreme.ozlabs.ibm.com> Message-ID: <1102471408.11517.56.camel@gaston> On Wed, 2004-12-08 at 13:02 +1100, Anton Blanchard wrote: > > Doesn't look like a no-op :) > > Look closer, accessing something as an int when its a long :) Ahaha :) Ok, I'll fix pmac for now. Ben. From benh at kernel.crashing.org Wed Dec 8 13:05:15 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 08 Dec 2004 13:05:15 +1100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102456836.11502.33.camel@gaston> <1102463266.11516.39.camel@gaston> <1102464972.11516.46.camel@gaston> Message-ID: <1102471515.25180.58.camel@gaston> Ok, let me know if that works: Index: linux-work/arch/ppc64/kernel/pmac_pci.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pmac_pci.c 2004-11-22 11:49:24.000000000 +1100 +++ linux-work/arch/ppc64/kernel/pmac_pci.c 2004-12-08 13:04:42.607006832 +1100 @@ -739,8 +739,8 @@ pmac_check_ht_link(); - /* Tell pci.c to use the common resource allocation mecanism */ - pci_probe_only = 0; + /* Tell pci.c to not use the common resource allocation mecanism */ + pci_probe_only = 1; /* Allow all IO */ io_page_mask = -1; From tiwari.amit at gmail.com Wed Dec 8 14:42:04 2004 From: tiwari.amit at gmail.com (Amit K Tiwari) Date: Wed, 8 Dec 2004 09:12:04 +0530 Subject: User allocated memory for DMA In-Reply-To: <1102403388.11502.8.camel@gaston> References: <1102403388.11502.8.camel@gaston> Message-ID: This is 2.6.6 (64 bit Yellow Dog Linux from Terrasoft for Apple XServe) Amit K T On Tue, 07 Dec 2004 18:09:48 +1100, Benjamin Herrenschmidt wrote: > On Mon, 2004-12-06 at 18:29 +0530, Amit K Tiwari wrote: > > > My problem is that at step 3 I do not get any contiguous region. > > Surprisingly, some of the physical addresses are such that the low > > order page (say page 151) has high order physical address (say, > > 1ef234000) while the next high order page (say page 152) has the next > > lower physical address(say 1ef233000) and this trend continues for > > some 10 pages after which the addresses totally far apart. > > What kernel version is this ? I remember something about the default > allocation "direction" of pages beeing changed a while ago... > > Ben. > > From tiwari.amit at gmail.com Wed Dec 8 14:49:01 2004 From: tiwari.amit at gmail.com (Amit K Tiwari) Date: Wed, 8 Dec 2004 09:19:01 +0530 Subject: User allocated memory for DMA In-Reply-To: <41B46B07.4060504@austin.ibm.com> References: <41B46B07.4060504@austin.ibm.com> Message-ID: Thanks, it helped a lot. Here is what works - 1. Increase the shared memory limit (sysctl kernel.shmmax) 2. Configure the large pages (echo 256 > /proc/sys/vm/nr_hugepages) 3. Use shmget to get a handle to huge pages - Set the SHM_HUGETLB bits in flags. On Mon, 06 Dec 2004 08:21:59 -0600, Olof Johansson wrote: > Amit K Tiwari wrote: > > >I am writing a driver for a PCIX board to run on Y-HPC (64-bit Yellow > >Dog Linux pre-installed on Apple XServe). The application is to > >process the data acquired from the board and send the results out. The > >data needs to be acquired by DMA into a user allocated buffer. The > >amount of data is huge and I need about 1.5 GB per DMA operation. > > > > > The current IOMMU isn't quite written with the expectation that the user > will keep a 1.5GB mapping of DMA memory active for an extensive period > of time. It should work, but let me know if you see any strangeness. > > > > >Here is what I am doing: > >1. malloc and mlock the memory in user mode. > >2. get the user pages (get_user_pages) in the driver. > >3. Find out how much physically contiguous memory did the application > >get (Pass through all the pages and see if the physical addresses got > >from page_to_phys are contiguous). > >4. Prepare a scatter-gather list of contiguous regions. > >5. pci_map_sg the sg list to get the DMA addresses for each of the entries. > > > >My problem is that at step 3 I do not get any contiguous region. > >Surprisingly, some of the physical addresses are such that the low > >order page (say page 151) has high order physical address (say, > >1ef234000) while the next high order page (say page 152) has the next > >lower physical address(say 1ef233000) and this trend continues for > >some 10 pages after which the addresses totally far apart. > > > > > This doesn't really surprise me. You have no guarantees where in > physical memory that a process will get its pages allocated, they can be > anywhere. The nice thing is that with the IOMMU, you can get a mostly > contigous address range as seen by the PCI adapter. The translation > between the two is transparent to the DMA operation. > > >As a result of this, my scatter gather list has as many entries > >as there are pages allocated. > > > >Am I doing something bad here? > >I know there is limit on memory that can be allocated through > >pci_alloc_*. Is there any other way in which I can allocate memory for > >DMA? > > > > > Have you considered making your application use large pages? If so, > you'd at least get 16MB contigous at a time. > > > -Olof > From benh at kernel.crashing.org Wed Dec 8 15:45:17 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 08 Dec 2004 15:45:17 +1100 Subject: User allocated memory for DMA In-Reply-To: References: <41B46B07.4060504@austin.ibm.com> Message-ID: <1102481117.11503.62.camel@gaston> On Wed, 2004-12-08 at 09:19 +0530, Amit K Tiwari wrote: > Thanks, it helped a lot. > Here is what works - > 1. Increase the shared memory limit (sysctl kernel.shmmax) > 2. Configure the large pages (echo 256 > /proc/sys/vm/nr_hugepages) > 3. Use shmget to get a handle to huge pages - Set the SHM_HUGETLB > bits in flags. Still, user pages beeing mapped in wrong order is a bit worrying... I may be fixed in more recent kernels, or not, I'd rather do some investigation... Ben. From schwab at suse.de Wed Dec 8 21:21:04 2004 From: schwab at suse.de (Andreas Schwab) Date: Wed, 08 Dec 2004 11:21:04 +0100 Subject: 2.6.10-rc3 does not boot on PowerMac In-Reply-To: <1102471515.25180.58.camel@gaston> (Benjamin Herrenschmidt's message of "Wed, 08 Dec 2004 13:05:15 +1100") References: <20041204182043.GG7714@krispykreme.ozlabs.ibm.com> <1102456836.11502.33.camel@gaston> <1102463266.11516.39.camel@gaston> <1102464972.11516.46.camel@gaston> <1102471515.25180.58.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > Ok, let me know if that works: > > Index: linux-work/arch/ppc64/kernel/pmac_pci.c > =================================================================== > --- linux-work.orig/arch/ppc64/kernel/pmac_pci.c 2004-11-22 11:49:24.000000000 +1100 > +++ linux-work/arch/ppc64/kernel/pmac_pci.c 2004-12-08 13:04:42.607006832 +1100 > @@ -739,8 +739,8 @@ > > pmac_check_ht_link(); > > - /* Tell pci.c to use the common resource allocation mecanism */ > - pci_probe_only = 0; > + /* Tell pci.c to not use the common resource allocation mecanism */ > + pci_probe_only = 1; > > /* Allow all IO */ > io_page_mask = -1; Yes, that works (just as well as commenting it out since it's default anyway). Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From ebenoit at hopevale.com Thu Dec 9 02:02:57 2004 From: ebenoit at hopevale.com (Eric) Date: Wed, 08 Dec 2004 10:02:57 -0500 Subject: unsubscribe Message-ID: <41B717A1.50901@hopevale.com> From sfr at canb.auug.org.au Thu Dec 9 00:38:00 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 9 Dec 2004 00:38:00 +1100 Subject: naca cleanups Message-ID: <20041209003800.7e38a32c.sfr@canb.auug.org.au> Hi all, I have been "encouraged" to clean up the naca and as a first pass, I have been looking at moving some of the fields out of the naca. Of course, my understanding of the naca is "incomplete" so first question: Does anyone know if the [di]CacheL1LogLineSize and [di]LinesPerPage fields need to be in the naca and would there be any problem in moving them to the systemcfg structure? -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041209/04ad8af9/attachment.pgp From anton at samba.org Thu Dec 9 01:08:59 2004 From: anton at samba.org (Anton Blanchard) Date: Thu, 9 Dec 2004 01:08:59 +1100 Subject: naca cleanups In-Reply-To: <20041209003800.7e38a32c.sfr@canb.auug.org.au> References: <20041209003800.7e38a32c.sfr@canb.auug.org.au> Message-ID: <20041208140859.GB32138@krispykreme.ozlabs.ibm.com> Hi Stephen, > I have been "encouraged" to clean up the naca and as a first pass, I > have been looking at moving some of the fields out of the naca. Of course, > my understanding of the naca is "incomplete" so first question: > > Does anyone know if the [di]CacheL1LogLineSize and [di]LinesPerPage fields > need to be in the naca and would there be any problem in moving them to > the systemcfg structure? I cant see why they need to be in there. I always wondered if we would be able to handle a change in cacheline size, if the *CacheL1* stuff doesnt cover it all then we may as well use #defines instead. I _think_ only the first 4 fields in the NACA are architected by the OS/400 hypervisor, and none of them by phyp. Anton From sfr at canb.auug.org.au Thu Dec 9 01:28:27 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 9 Dec 2004 01:28:27 +1100 Subject: naca cleanups In-Reply-To: <20041208140859.GB32138@krispykreme.ozlabs.ibm.com> References: <20041209003800.7e38a32c.sfr@canb.auug.org.au> <20041208140859.GB32138@krispykreme.ozlabs.ibm.com> Message-ID: <20041209012827.2f5be3a4.sfr@canb.auug.org.au> On Thu, 9 Dec 2004 01:08:59 +1100 Anton Blanchard wrote: > > > I have been "encouraged" to clean up the naca and as a first pass, I > > have been looking at moving some of the fields out of the naca. Of course, > > my understanding of the naca is "incomplete" so first question: > > > > Does anyone know if the [di]CacheL1LogLineSize and [di]LinesPerPage fields > > need to be in the naca and would there be any problem in moving them to > > the systemcfg structure? > > I cant see why they need to be in there. I always wondered if we would > be able to handle a change in cacheline size, if the *CacheL1* stuff > doesnt cover it all then we may as well use #defines instead. > > I _think_ only the first 4 fields in the NACA are architected by the > OS/400 hypervisor, and none of them by phyp. OK, so how about this as a first pass. I have left the fields in the naca just in case something reads them out of /proc/ppc64/naca. Untested, not even compiled, just for comment (especially the asm bits). -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.1/arch/ppc64/kernel/asm-offsets.c --- linus-bk/arch/ppc64/kernel/asm-offsets.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/asm-offsets.c 2004-12-09 00:59:10.000000000 +1100 @@ -70,11 +70,11 @@ /* naca */ DEFINE(PACA, offsetof(struct naca_struct, paca)); DEFINE(DCACHEL1LINESIZE, offsetof(struct systemcfg, dCacheL1LineSize)); - DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct naca_struct, dCacheL1LogLineSize)); - DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct naca_struct, dCacheL1LinesPerPage)); + DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct systemcfg, dCacheL1LogLineSize)); + DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct systemcfg, dCacheL1LinesPerPage)); DEFINE(ICACHEL1LINESIZE, offsetof(struct systemcfg, iCacheL1LineSize)); - DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct naca_struct, iCacheL1LogLineSize)); - DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct naca_struct, iCacheL1LinesPerPage)); + DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct systemcfg, iCacheL1LogLineSize)); + DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct systemcfg, iCacheL1LinesPerPage)); DEFINE(PLATFORM, offsetof(struct systemcfg, platform)); /* paca */ diff -ruN linus-bk/arch/ppc64/kernel/iSeries_setup.c linus-bk-naca.1/arch/ppc64/kernel/iSeries_setup.c --- linus-bk/arch/ppc64/kernel/iSeries_setup.c 2004-11-12 09:09:48.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/iSeries_setup.c 2004-12-09 01:01:44.000000000 +1100 @@ -568,20 +568,26 @@ xIoHriProcessorVpd[procIx].xDataL1CacheSizeKB * 1024; systemcfg->dCacheL1LineSize = xIoHriProcessorVpd[procIx].xDataCacheOperandSize; - naca->iCacheL1LinesPerPage = PAGE_SIZE / systemcfg->iCacheL1LineSize; - naca->dCacheL1LinesPerPage = PAGE_SIZE / systemcfg->dCacheL1LineSize; + systemcfg->iCacheL1LinesPerPage = + naca->iCacheL1LinesPerPage = + PAGE_SIZE / systemcfg->iCacheL1LineSize; + systemcfg->dCacheL1LinesPerPage = + naca->dCacheL1LinesPerPage = + PAGE_SIZE / systemcfg->dCacheL1LineSize; i = systemcfg->iCacheL1LineSize; n = 0; while ((i = (i / 2))) ++n; - naca->iCacheL1LogLineSize = n; + systemcfg->iCacheL1LogLineSize = + naca->iCacheL1LogLineSize = n; i = systemcfg->dCacheL1LineSize; n = 0; while ((i = (i / 2))) ++n; - naca->dCacheL1LogLineSize = n; + systemcfg->dCacheL1LogLineSize = + naca->dCacheL1LogLineSize = n; printk("D-cache line size = %d\n", (unsigned int)systemcfg->dCacheL1LineSize); diff -ruN linus-bk/arch/ppc64/kernel/misc.S linus-bk-naca.1/arch/ppc64/kernel/misc.S --- linus-bk/arch/ppc64/kernel/misc.S 2004-11-12 09:09:48.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/misc.S 2004-12-09 00:51:37.000000000 +1100 @@ -207,8 +207,6 @@ * and in some cases i-cache and d-cache line sizes differ from * each other. */ - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) LOADADDR(r11,systemcfg) /* Get systemcfg address */ ld r11,0(r11) lwz r7,DCACHEL1LINESIZE(r11)/* Get cache line size */ @@ -216,7 +214,7 @@ andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ add r8,r8,r5 /* ensure we get enough */ - lwz r9,DCACHEL1LOGLINESIZE(r10) /* Get log-2 of cache line size */ + lwz r9,DCACHEL1LOGLINESIZE(r11) /* Get log-2 of cache line size */ srw. r8,r8,r9 /* compute line count */ beqlr /* nothing to do? */ mtctr r8 @@ -232,7 +230,7 @@ andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ add r8,r8,r5 - lwz r9,ICACHEL1LOGLINESIZE(r10) /* Get log-2 of Icache line size */ + lwz r9,ICACHEL1LOGLINESIZE(r11) /* Get log-2 of Icache line size */ srw. r8,r8,r9 /* compute line count */ beqlr /* nothing to do? */ mtctr r8 @@ -256,8 +254,6 @@ * * Different systems have different cache line sizes */ - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) LOADADDR(r11,systemcfg) /* Get systemcfg address */ ld r11,0(r11) lwz r7,DCACHEL1LINESIZE(r11) /* Get dcache line size */ @@ -265,7 +261,7 @@ andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ add r8,r8,r5 /* ensure we get enough */ - lwz r9,DCACHEL1LOGLINESIZE(r10) /* Get log-2 of dcache line size */ + lwz r9,DCACHEL1LOGLINESIZE(r11) /* Get log-2 of dcache line size */ srw. r8,r8,r9 /* compute line count */ beqlr /* nothing to do? */ mtctr r8 @@ -286,8 +282,6 @@ * flush all bytes from start to stop-1 inclusive */ _GLOBAL(flush_dcache_phys_range) - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) LOADADDR(r11,systemcfg) /* Get systemcfg address */ ld r11,0(r11) lwz r7,DCACHEL1LINESIZE(r11) /* Get dcache line size */ @@ -295,7 +289,7 @@ andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ add r8,r8,r5 /* ensure we get enough */ - lwz r9,DCACHEL1LOGLINESIZE(r10) /* Get log-2 of dcache line size */ + lwz r9,DCACHEL1LOGLINESIZE(r11) /* Get log-2 of dcache line size */ srw. r8,r8,r9 /* compute line count */ beqlr /* nothing to do? */ mfmsr r5 /* Disable MMU Data Relocation */ @@ -332,12 +326,10 @@ */ /* Flush the dcache */ - LOADADDR(r7,naca) - ld r7,0(r7) LOADADDR(r8,systemcfg) /* Get systemcfg address */ ld r8,0(r8) clrrdi r3,r3,12 /* Page align */ - lwz r4,DCACHEL1LINESPERPAGE(r7) /* Get # dcache lines per page */ + lwz r4,DCACHEL1LINESPERPAGE(r8) /* Get # dcache lines per page */ lwz r5,DCACHEL1LINESIZE(r8) /* Get dcache line size */ mr r6,r3 mtctr r4 @@ -348,7 +340,7 @@ /* Now invalidate the icache */ - lwz r4,ICACHEL1LINESPERPAGE(r7) /* Get # icache lines per page */ + lwz r4,ICACHEL1LINESPERPAGE(r8) /* Get # icache lines per page */ lwz r5,ICACHEL1LINESIZE(r8) /* Get icache line size */ mtctr r4 1: icbi 0,r3 diff -ruN linus-bk/arch/ppc64/kernel/setup.c linus-bk-naca.1/arch/ppc64/kernel/setup.c --- linus-bk/arch/ppc64/kernel/setup.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1/arch/ppc64/kernel/setup.c 2004-12-09 00:57:58.000000000 +1100 @@ -496,8 +496,10 @@ systemcfg->dCacheL1Size = size; systemcfg->dCacheL1LineSize = lsize; - naca->dCacheL1LogLineSize = __ilog2(lsize); - naca->dCacheL1LinesPerPage = PAGE_SIZE/(lsize); + systemcfg->dCacheL1LogLineSize = + naca->dCacheL1LogLineSize = __ilog2(lsize); + systemcfg->dCacheL1LinesPerPage = + naca->dCacheL1LinesPerPage = PAGE_SIZE/(lsize); size = 0; lsize = cur_cpu_spec->icache_bsize; @@ -513,8 +515,10 @@ systemcfg->iCacheL1Size = size; systemcfg->iCacheL1LineSize = lsize; - naca->iCacheL1LogLineSize = __ilog2(lsize); - naca->iCacheL1LinesPerPage = PAGE_SIZE/(lsize); + systemcfg->iCacheL1LogLineSize + naca->iCacheL1LogLineSize = __ilog2(lsize); + systemcfg->iCacheL1LinesPerPage + naca->iCacheL1LinesPerPage = PAGE_SIZE/(lsize); } } diff -ruN linus-bk/include/asm-ppc64/page.h linus-bk-naca.1/include/asm-ppc64/page.h --- linus-bk/include/asm-ppc64/page.h 2004-10-29 07:03:22.000000000 +1000 +++ linus-bk-naca.1/include/asm-ppc64/page.h 2004-12-09 01:03:57.000000000 +1100 @@ -107,7 +107,7 @@ unsigned long lines, line_size; line_size = systemcfg->dCacheL1LineSize; - lines = naca->dCacheL1LinesPerPage; + lines = systemcfg->dCacheL1LinesPerPage; __asm__ __volatile__( "mtctr %1 # clear_page\n\ diff -ruN linus-bk/include/asm-ppc64/systemcfg.h linus-bk-naca.1/include/asm-ppc64/systemcfg.h --- linus-bk/include/asm-ppc64/systemcfg.h 2004-09-29 08:25:16.000000000 +1000 +++ linus-bk-naca.1/include/asm-ppc64/systemcfg.h 2004-12-09 01:10:58.000000000 +1100 @@ -28,7 +28,7 @@ * Minor version changes are a hint. */ #define SYSTEMCFG_MAJOR 1 -#define SYSTEMCFG_MINOR 0 +#define SYSTEMCFG_MINOR 1 #ifndef __ASSEMBLY__ @@ -54,7 +54,11 @@ __u32 dCacheL1LineSize; /* L1 d-cache line size 0x64 */ __u32 iCacheL1Size; /* L1 i-cache size 0x68 */ __u32 iCacheL1LineSize; /* L1 i-cache line size 0x6C */ - __u8 reserved0[3984]; /* Reserve rest of page 0x70 */ + __u32 dCacheL1LogLineSize; /* L1 d-cache line size Log2 0x70 */ + __u32 dCacheL1LinesPerPage; /* L1 d-cache lines / page 0x74 */ + __u32 iCacheL1LogLineSize; /* L1 i-cache line size Log2 0x78 */ + __u32 iCacheL1LinesPerPage; /* L1 i-cache lines / page 0x7c */ + __u8 reserved0[3968]; /* Reserve rest of page 0x80 */ }; #ifdef __KERNEL__ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041209/33faebe8/attachment.pgp From sfr at canb.auug.org.au Thu Dec 9 02:38:08 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 9 Dec 2004 02:38:08 +1100 Subject: naca cleanups In-Reply-To: <20041209012827.2f5be3a4.sfr@canb.auug.org.au> References: <20041209003800.7e38a32c.sfr@canb.auug.org.au> <20041208140859.GB32138@krispykreme.ozlabs.ibm.com> <20041209012827.2f5be3a4.sfr@canb.auug.org.au> Message-ID: <20041209023808.3ab0a448.sfr@canb.auug.org.au> Another one for comment. This one takes usage of pftSize out of the naca. I couldn't see any reason it should not become a global vaiable. Again, the field is left there in case someone uses it from /proc/ppc64/naca. This applies on top of the previous patch. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.1/arch/ppc64/kernel/pSeries_lpar.c linus-bk-naca.2/arch/ppc64/kernel/pSeries_lpar.c --- linus-bk-naca.1/arch/ppc64/kernel/pSeries_lpar.c 2004-10-28 16:57:54.000000000 +1000 +++ linus-bk-naca.2/arch/ppc64/kernel/pSeries_lpar.c 2004-12-09 02:19:32.000000000 +1100 @@ -352,7 +352,7 @@ static void pSeries_lpar_hptab_clear(void) { - unsigned long size_bytes = 1UL << naca->pftSize; + unsigned long size_bytes = 1UL << ppc64_pft_size; unsigned long hpte_count = size_bytes >> 4; unsigned long dummy1, dummy2; int i; diff -ruN linus-bk-naca.1/arch/ppc64/kernel/prom.c linus-bk-naca.2/arch/ppc64/kernel/prom.c --- linus-bk-naca.1/arch/ppc64/kernel/prom.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.2/arch/ppc64/kernel/prom.c 2004-12-09 02:28:29.000000000 +1100 @@ -844,12 +844,12 @@ /* On LPAR, look for the first ibm,pft-size property for the hash table size */ - if (systemcfg->platform == PLATFORM_PSERIES_LPAR && naca->pftSize == 0) { + if (systemcfg->platform == PLATFORM_PSERIES_LPAR && ppc64_pft_size == 0) { u32 *pft_size; pft_size = (u32 *)get_flat_dt_prop(node, "ibm,pft-size", NULL); if (pft_size != NULL) { /* pft_size[0] is the NUMA CEC cookie */ - naca->pftSize = pft_size[1]; + naca->pftSize = ppc64_pft_size = pft_size[1]; } } @@ -1018,7 +1018,7 @@ initial_boot_params = params; /* By default, hash size is not set */ - naca->pftSize = 0; + naca->pftSize = ppc64_pft_size = 0; /* Retreive various informations from the /chosen node of the * device-tree, including the platform type, initrd location and @@ -1047,7 +1047,7 @@ /* If hash size wasn't obtained above, we calculate it now based on * the total RAM size */ - if (naca->pftSize == 0) { + if (ppc64_pft_size == 0) { unsigned long rnd_mem_size, pteg_count; /* round mem_size up to next power of 2 */ @@ -1058,10 +1058,10 @@ /* # pages / 2 */ pteg_count = (rnd_mem_size >> (12 + 1)); - naca->pftSize = __ilog2(pteg_count << 7); + naca->pftSize = ppc64_pft_size = __ilog2(pteg_count << 7); } - DBG("Hash pftSize: %x\n", (int)naca->pftSize); + DBG("Hash pftSize: %x\n", (int)ppc64_pft_size); DBG(" <- early_init_devtree()\n"); } diff -ruN linus-bk-naca.1/arch/ppc64/kernel/setup.c linus-bk-naca.2/arch/ppc64/kernel/setup.c --- linus-bk-naca.1/arch/ppc64/kernel/setup.c 2004-12-09 00:57:58.000000000 +1100 +++ linus-bk-naca.2/arch/ppc64/kernel/setup.c 2004-12-09 02:26:30.000000000 +1100 @@ -110,6 +110,7 @@ int boot_cpuid = 0; int boot_cpuid_phys = 0; dev_t boot_dev; +u64 ppc64_pft_size; /* * These are used in binfmt_elf.c to put aux entries on the stack @@ -661,7 +662,7 @@ printk("-----------------------------------------------------\n"); printk("naca = 0x%p\n", naca); - printk("naca->pftSize = 0x%lx\n", naca->pftSize); + printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); printk("naca->debug_switch = 0x%lx\n", naca->debug_switch); printk("naca->interrupt_controller = 0x%ld\n", naca->interrupt_controller); printk("systemcfg = 0x%p\n", systemcfg); diff -ruN linus-bk-naca.1/arch/ppc64/mm/hash_utils.c linus-bk-naca.2/arch/ppc64/mm/hash_utils.c --- linus-bk-naca.1/arch/ppc64/mm/hash_utils.c 2004-10-29 07:03:21.000000000 +1000 +++ linus-bk-naca.2/arch/ppc64/mm/hash_utils.c 2004-12-09 02:16:21.000000000 +1100 @@ -147,7 +147,7 @@ * Calculate the required size of the htab. We want the number of * PTEGs to equal one half the number of real pages. */ - htab_size_bytes = 1UL << naca->pftSize; + htab_size_bytes = 1UL << ppc64_pft_size; pteg_count = htab_size_bytes >> 7; /* For debug, make the HTAB 1/8 as big as it normally would be. */ diff -ruN linus-bk-naca.1/include/asm-ppc64/naca.h linus-bk-naca.2/include/asm-ppc64/naca.h --- linus-bk-naca.1/include/asm-ppc64/naca.h 2004-09-16 21:51:58.000000000 +1000 +++ linus-bk-naca.2/include/asm-ppc64/naca.h 2004-12-09 02:14:59.000000000 +1100 @@ -11,7 +11,8 @@ */ #include -#include +#include /* for PAGE_SHIFT and KERNELBASE */ +#include #ifndef __ASSEMBLY__ @@ -31,16 +32,17 @@ u64 serialPortAddr; /* Phy addr of serial port 0x38 */ u64 interrupt_controller; /* Type of int controller 0x40 */ u64 unused1; /* was SLB size in entries 0x48 */ +/* The fields below here are unused */ u64 pftSize; /* Log 2 of page table size 0x50 */ void *systemcfg; /* Pointer to systemcfg data 0x58 */ u32 dCacheL1LogLineSize; /* L1 d-cache line size Log2 0x60 */ u32 dCacheL1LinesPerPage; /* L1 d-cache lines / page 0x64 */ u32 iCacheL1LogLineSize; /* L1 i-cache line size Log2 0x68 */ u32 iCacheL1LinesPerPage; /* L1 i-cache lines / page 0x6c */ - u8 resv0[15]; /* Reserved 0x71 - 0x7F */ }; extern struct naca_struct *naca; +extern u64 ppc64_pft_size; #endif /* __ASSEMBLY__ */ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041209/41ad074d/attachment.pgp From anton at samba.org Thu Dec 9 02:43:55 2004 From: anton at samba.org (Anton Blanchard) Date: Thu, 9 Dec 2004 02:43:55 +1100 Subject: naca cleanups In-Reply-To: <20041209012827.2f5be3a4.sfr@canb.auug.org.au> References: <20041209003800.7e38a32c.sfr@canb.auug.org.au> <20041208140859.GB32138@krispykreme.ozlabs.ibm.com> <20041209012827.2f5be3a4.sfr@canb.auug.org.au> Message-ID: <20041208154355.GC32138@krispykreme.ozlabs.ibm.com> Hi Stephen, > OK, so how about this as a first pass. I have left the fields in > the naca just in case something reads them out of /proc/ppc64/naca. Since we export systemcfg to userspace it might be worth keeping it out of there too (just in case someone decides to use it and we have to support it) Anton From olof at austin.ibm.com Thu Dec 9 02:51:44 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 08 Dec 2004 09:51:44 -0600 Subject: naca cleanups In-Reply-To: <20041209023808.3ab0a448.sfr@canb.auug.org.au> References: <20041209003800.7e38a32c.sfr@canb.auug.org.au> <20041208140859.GB32138@krispykreme.ozlabs.ibm.com> <20041209012827.2f5be3a4.sfr@canb.auug.org.au> <20041209023808.3ab0a448.sfr@canb.auug.org.au> Message-ID: <41B72310.1040407@austin.ibm.com> Stephen Rothwell wrote: >Another one for comment. This one takes usage of pftSize out of the naca. >I couldn't see any reason it should not become a global vaiable. Again, >the field is left there in case someone uses it from /proc/ppc64/naca. > > This applies to the previous patch too: How about killing the SiLlyCApS while you're at it? :) -Olof From benh at kernel.crashing.org Thu Dec 9 08:22:26 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 09 Dec 2004 08:22:26 +1100 Subject: [PATCH] ppc64: Workaround PCI issue on g5 Message-ID: <1102540946.17299.12.camel@gaston> Hi ! ppc64 has a global called "pci_probe_only" which, when set, prevents the arch PCI code from calling pci_assign_unassigned_resources(). This was cleared by pmac so far, but a bug in the definition of the variable make that ineffective until 2.6.10, and so we never called pci_assign_unassigned_resources(). With 2.6.10, that bug was fixed and so we now call it, which results in some problems. Some devices who have perfectly valid assigned addresses by firmware end up beeing moved around anyway, which is a BAD thing can can break boot on some machines since it breaks the relationship between addresses in Open Firmware device-tree and actual location of PCI devices. (Some low level things like the PIC are ioremap'ed based on their OF address, way before the PCI based ASIC hosting them has been found). This also break the "offb" default framebuffer driver since the video card ends up beeing moved around as well. For now, the fix is to set pci_probe_only on pmac, thus reverting to the old behaviour. In the long run, it would be interesting to "fix" pci_assign_unassigned_resources() so that it does what it's name claims, and only assigns things that have been left unassigned instead of moving things around gratuituously... Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/pmac_pci.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pmac_pci.c 2004-11-22 11:49:24.000000000 +1100 +++ linux-work/arch/ppc64/kernel/pmac_pci.c 2004-12-08 13:04:42.607006832 +1100 @@ -739,8 +739,8 @@ pmac_check_ht_link(); - /* Tell pci.c to use the common resource allocation mecanism */ - pci_probe_only = 0; + /* Tell pci.c to not use the common resource allocation mecanism */ + pci_probe_only = 1; /* Allow all IO */ io_page_mask = -1; From sfr at canb.auug.org.au Thu Dec 9 15:50:37 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 9 Dec 2004 15:50:37 +1100 Subject: [PATCH][RFC] consolidate cache sizing variables In-Reply-To: <20041209003800.7e38a32c.sfr@canb.auug.org.au> References: <20041209003800.7e38a32c.sfr@canb.auug.org.au> Message-ID: <20041209155037.666233c8.sfr@canb.auug.org.au> Hi all, This is a different approach to the naca cleanups. This patch puts all the variables that relate to the cache sizes into a single structure (and removes them from the naca and doesn't use them from the systemcfg any more). Please review (especially the assembler part). This builds on iSeries, pSeries and pmac, but has not been booted anywhere. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk/arch/ppc64/kernel/asm-offsets.c linus-bk-naca.1a/arch/ppc64/kernel/asm-offsets.c --- linus-bk/arch/ppc64/kernel/asm-offsets.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1a/arch/ppc64/kernel/asm-offsets.c 2004-12-09 14:47:31.000000000 +1100 @@ -35,6 +35,7 @@ #include #include #include +#include #define DEFINE(sym, val) \ asm volatile("\n->" #sym " %0 " #val : : "i" (val)) @@ -69,12 +70,12 @@ /* naca */ DEFINE(PACA, offsetof(struct naca_struct, paca)); - DEFINE(DCACHEL1LINESIZE, offsetof(struct systemcfg, dCacheL1LineSize)); - DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct naca_struct, dCacheL1LogLineSize)); - DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct naca_struct, dCacheL1LinesPerPage)); - DEFINE(ICACHEL1LINESIZE, offsetof(struct systemcfg, iCacheL1LineSize)); - DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct naca_struct, iCacheL1LogLineSize)); - DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct naca_struct, iCacheL1LinesPerPage)); + DEFINE(DCACHEL1LINESIZE, offsetof(struct ppc64_caches, dline_size)); + DEFINE(DCACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_dline_size)); + DEFINE(DCACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, dlines_per_page)); + DEFINE(ICACHEL1LINESIZE, offsetof(struct ppc64_caches, iline_size)); + DEFINE(ICACHEL1LOGLINESIZE, offsetof(struct ppc64_caches, log_iline_size)); + DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, ilines_per_page)); DEFINE(PLATFORM, offsetof(struct systemcfg, platform)); /* paca */ diff -ruN linus-bk/arch/ppc64/kernel/eeh.c linus-bk-naca.1a/arch/ppc64/kernel/eeh.c --- linus-bk/arch/ppc64/kernel/eeh.c 2004-10-26 16:06:41.000000000 +1000 +++ linus-bk-naca.1a/arch/ppc64/kernel/eeh.c 2004-12-09 15:00:27.000000000 +1100 @@ -32,6 +32,7 @@ #include #include #include +#include #include "pci.h" #undef DEBUG diff -ruN linus-bk/arch/ppc64/kernel/iSeries_setup.c linus-bk-naca.1a/arch/ppc64/kernel/iSeries_setup.c --- linus-bk/arch/ppc64/kernel/iSeries_setup.c 2004-11-12 09:09:48.000000000 +1100 +++ linus-bk-naca.1a/arch/ppc64/kernel/iSeries_setup.c 2004-12-09 14:48:09.000000000 +1100 @@ -44,6 +44,7 @@ #include "iSeries_setup.h" #include #include +#include #include #include #include @@ -560,33 +561,36 @@ unsigned int i, n; unsigned int procIx = get_paca()->lppaca.xDynHvPhysicalProcIndex; - systemcfg->iCacheL1Size = - xIoHriProcessorVpd[procIx].xInstCacheSize * 1024; - systemcfg->iCacheL1LineSize = + systemcfg->icache_size = + ppc64_caches.isize = xIoHriProcessorVpd[procIx].xInstCacheSize * 1024; + systemcfg->icache_line_size = + ppc64_caches.iline_size = xIoHriProcessorVpd[procIx].xInstCacheOperandSize; - systemcfg->dCacheL1Size = + systemcfg->dcache_size = + ppc64_caches.dsize = xIoHriProcessorVpd[procIx].xDataL1CacheSizeKB * 1024; - systemcfg->dCacheL1LineSize = + systemcfg->dcache_line_size = + ppc64_caches.dline_size = xIoHriProcessorVpd[procIx].xDataCacheOperandSize; - naca->iCacheL1LinesPerPage = PAGE_SIZE / systemcfg->iCacheL1LineSize; - naca->dCacheL1LinesPerPage = PAGE_SIZE / systemcfg->dCacheL1LineSize; + ppc64_caches.ilines_per_page = PAGE_SIZE / ppc64_caches.iline_size; + ppc64_caches.dlines_per_page = PAGE_SIZE / ppc64_caches.dline_size; - i = systemcfg->iCacheL1LineSize; + i = ppc64_caches.iline_size; n = 0; while ((i = (i / 2))) ++n; - naca->iCacheL1LogLineSize = n; + ppc64_caches.log_iline_size = n; - i = systemcfg->dCacheL1LineSize; + i = ppc64_caches.dline_size; n = 0; while ((i = (i / 2))) ++n; - naca->dCacheL1LogLineSize = n; + ppc64_caches.log_dline_size = n; printk("D-cache line size = %d\n", - (unsigned int)systemcfg->dCacheL1LineSize); + (unsigned int)ppc64_caches.dline_size); printk("I-cache line size = %d\n", - (unsigned int)systemcfg->iCacheL1LineSize); + (unsigned int)ppc64_caches.iline_size); } /* diff -ruN linus-bk/arch/ppc64/kernel/idle.c linus-bk-naca.1a/arch/ppc64/kernel/idle.c --- linus-bk/arch/ppc64/kernel/idle.c 2004-10-27 07:32:57.000000000 +1000 +++ linus-bk-naca.1a/arch/ppc64/kernel/idle.c 2004-12-09 14:56:38.000000000 +1100 @@ -32,6 +32,7 @@ #include #include #include +#include extern void power4_idle(void); diff -ruN linus-bk/arch/ppc64/kernel/misc.S linus-bk-naca.1a/arch/ppc64/kernel/misc.S --- linus-bk/arch/ppc64/kernel/misc.S 2004-11-12 09:09:48.000000000 +1100 +++ linus-bk-naca.1a/arch/ppc64/kernel/misc.S 2004-12-09 13:26:26.000000000 +1100 @@ -189,6 +189,11 @@ isync blr + .section ".toc","aw" +PPC64_CACHES: + .tc ppc64_caches[TC],ppc64_caches + .section ".text" + /* * Write any modified data cache blocks out to memory * and invalidate the corresponding instruction cache blocks. @@ -207,11 +212,8 @@ * and in some cases i-cache and d-cache line sizes differ from * each other. */ - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) - LOADADDR(r11,systemcfg) /* Get systemcfg address */ - ld r11,0(r11) - lwz r7,DCACHEL1LINESIZE(r11)/* Get cache line size */ + ld r10,PPC64_CACHES at toc(r2) + lwz r7,DCACHEL1LINESIZE(r10)/* Get cache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -227,7 +229,7 @@ /* Now invalidate the instruction cache */ - lwz r7,ICACHEL1LINESIZE(r11) /* Get Icache line size */ + lwz r7,ICACHEL1LINESIZE(r10) /* Get Icache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -256,11 +258,8 @@ * * Different systems have different cache line sizes */ - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) - LOADADDR(r11,systemcfg) /* Get systemcfg address */ - ld r11,0(r11) - lwz r7,DCACHEL1LINESIZE(r11) /* Get dcache line size */ + ld r10,PPC64_CACHES at toc(r2) + lwz r7,DCACHEL1LINESIZE(r10) /* Get dcache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -286,11 +285,8 @@ * flush all bytes from start to stop-1 inclusive */ _GLOBAL(flush_dcache_phys_range) - LOADADDR(r10,naca) /* Get Naca address */ - ld r10,0(r10) - LOADADDR(r11,systemcfg) /* Get systemcfg address */ - ld r11,0(r11) - lwz r7,DCACHEL1LINESIZE(r11) /* Get dcache line size */ + ld r10,PPC64_CACHES at toc(r2) + lwz r7,DCACHEL1LINESIZE(r10) /* Get dcache line size */ addi r5,r7,-1 andc r6,r3,r5 /* round low to line bdy */ subf r8,r6,r4 /* compute length */ @@ -332,13 +328,10 @@ */ /* Flush the dcache */ - LOADADDR(r7,naca) - ld r7,0(r7) - LOADADDR(r8,systemcfg) /* Get systemcfg address */ - ld r8,0(r8) + ld r7,PPC64_CACHES at toc(r2) clrrdi r3,r3,12 /* Page align */ lwz r4,DCACHEL1LINESPERPAGE(r7) /* Get # dcache lines per page */ - lwz r5,DCACHEL1LINESIZE(r8) /* Get dcache line size */ + lwz r5,DCACHEL1LINESIZE(r7) /* Get dcache line size */ mr r6,r3 mtctr r4 0: dcbst 0,r6 @@ -349,7 +342,7 @@ /* Now invalidate the icache */ lwz r4,ICACHEL1LINESPERPAGE(r7) /* Get # icache lines per page */ - lwz r5,ICACHEL1LINESIZE(r8) /* Get icache line size */ + lwz r5,ICACHEL1LINESIZE(r7) /* Get icache line size */ mtctr r4 1: icbi 0,r3 add r3,r3,r5 diff -ruN linus-bk/arch/ppc64/kernel/nvram.c linus-bk-naca.1a/arch/ppc64/kernel/nvram.c --- linus-bk/arch/ppc64/kernel/nvram.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.1a/arch/ppc64/kernel/nvram.c 2004-12-09 14:58:20.000000000 +1100 @@ -31,6 +31,7 @@ #include #include #include +#include #undef DEBUG_NVRAM diff -ruN linus-bk/arch/ppc64/kernel/pSeries_iommu.c linus-bk-naca.1a/arch/ppc64/kernel/pSeries_iommu.c --- linus-bk/arch/ppc64/kernel/pSeries_iommu.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1a/arch/ppc64/kernel/pSeries_iommu.c 2004-12-09 15:02:00.000000000 +1100 @@ -43,6 +43,7 @@ #include #include #include +#include #include "pci.h" diff -ruN linus-bk/arch/ppc64/kernel/pacaData.c linus-bk-naca.1a/arch/ppc64/kernel/pacaData.c --- linus-bk/arch/ppc64/kernel/pacaData.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1a/arch/ppc64/kernel/pacaData.c 2004-12-09 15:06:06.000000000 +1100 @@ -10,6 +10,8 @@ #include #include #include +#include + #include #include #include @@ -20,7 +22,9 @@ #include struct naca_struct *naca; +EXPORT_SYMBOL(naca); struct systemcfg *systemcfg; +EXPORT_SYMBOL(systemcfg); /* This symbol is provided by the linker - let it fill in the paca * field correctly */ diff -ruN linus-bk/arch/ppc64/kernel/pmac_setup.c linus-bk-naca.1a/arch/ppc64/kernel/pmac_setup.c --- linus-bk/arch/ppc64/kernel/pmac_setup.c 2004-10-25 18:18:33.000000000 +1000 +++ linus-bk-naca.1a/arch/ppc64/kernel/pmac_setup.c 2004-12-09 15:24:10.000000000 +1100 @@ -70,6 +70,7 @@ #include #include #include +#include #include "pmac.h" #include "mpic.h" diff -ruN linus-bk/arch/ppc64/kernel/ppc_ksyms.c linus-bk-naca.1a/arch/ppc64/kernel/ppc_ksyms.c --- linus-bk/arch/ppc64/kernel/ppc_ksyms.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.1a/arch/ppc64/kernel/ppc_ksyms.c 2004-12-09 15:04:59.000000000 +1100 @@ -67,7 +67,6 @@ EXPORT_SYMBOL(__down_interruptible); EXPORT_SYMBOL(__up); -EXPORT_SYMBOL(naca); EXPORT_SYMBOL(__down); #ifdef CONFIG_PPC_ISERIES EXPORT_SYMBOL(itLpNaca); @@ -162,4 +161,3 @@ EXPORT_SYMBOL(tb_ticks_per_usec); EXPORT_SYMBOL(paca); EXPORT_SYMBOL(cur_cpu_spec); -EXPORT_SYMBOL(systemcfg); diff -ruN linus-bk/arch/ppc64/kernel/rtas-proc.c linus-bk-naca.1a/arch/ppc64/kernel/rtas-proc.c --- linus-bk/arch/ppc64/kernel/rtas-proc.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.1a/arch/ppc64/kernel/rtas-proc.c 2004-12-09 15:02:27.000000000 +1100 @@ -31,6 +31,7 @@ #include #include /* for ppc_md */ #include +#include /* Token for Sensors */ #define KEY_SWITCH 0x0001 diff -ruN linus-bk/arch/ppc64/kernel/rtas.c linus-bk-naca.1a/arch/ppc64/kernel/rtas.c --- linus-bk/arch/ppc64/kernel/rtas.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1a/arch/ppc64/kernel/rtas.c 2004-12-09 15:01:23.000000000 +1100 @@ -29,6 +29,7 @@ #include #include #include +#include struct flash_block_list_header rtas_firmware_flash_list = {0, NULL}; diff -ruN linus-bk/arch/ppc64/kernel/rtasd.c linus-bk-naca.1a/arch/ppc64/kernel/rtasd.c --- linus-bk/arch/ppc64/kernel/rtasd.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.1a/arch/ppc64/kernel/rtasd.c 2004-12-09 15:00:50.000000000 +1100 @@ -26,6 +26,7 @@ #include #include #include +#include #if 0 #define DEBUG(A...) printk(KERN_ERR A) diff -ruN linus-bk/arch/ppc64/kernel/setup.c linus-bk-naca.1a/arch/ppc64/kernel/setup.c --- linus-bk/arch/ppc64/kernel/setup.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.1a/arch/ppc64/kernel/setup.c 2004-12-09 14:46:37.000000000 +1100 @@ -54,6 +54,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -111,6 +112,8 @@ int boot_cpuid_phys = 0; dev_t boot_dev; +struct ppc64_caches ppc64_caches; + /* * These are used in binfmt_elf.c to put aux entries on the stack * for each elf executable being started. @@ -489,15 +492,15 @@ lsizep = (u32 *) get_property(np, dc, NULL); if (lsizep != NULL) lsize = *lsizep; - if (sizep == 0 || lsizep == 0) DBG("Argh, can't find dcache properties ! " "sizep: %p, lsizep: %p\n", sizep, lsizep); - systemcfg->dCacheL1Size = size; - systemcfg->dCacheL1LineSize = lsize; - naca->dCacheL1LogLineSize = __ilog2(lsize); - naca->dCacheL1LinesPerPage = PAGE_SIZE/(lsize); + systemcfg->dcache_size = ppc64_caches.dsize = size; + systemcfg->dcache_line_size = + ppc64_caches.dline_size = lsize; + ppc64_caches.log_dline_size = __ilog2(lsize); + ppc64_caches.dlines_per_page = PAGE_SIZE / lsize; size = 0; lsize = cur_cpu_spec->icache_bsize; @@ -511,11 +514,11 @@ DBG("Argh, can't find icache properties ! " "sizep: %p, lsizep: %p\n", sizep, lsizep); - systemcfg->iCacheL1Size = size; - systemcfg->iCacheL1LineSize = lsize; - naca->iCacheL1LogLineSize = __ilog2(lsize); - naca->iCacheL1LinesPerPage = PAGE_SIZE/(lsize); - + systemcfg->icache_size = ppc64_caches.isize = size; + systemcfg->icache_line_size = + ppc64_caches.iline_size = lsize; + ppc64_caches.log_iline_size = __ilog2(lsize); + ppc64_caches.ilines_per_page = PAGE_SIZE / lsize; } } @@ -664,8 +667,10 @@ printk("systemcfg->platform = 0x%x\n", systemcfg->platform); printk("systemcfg->processorCount = 0x%lx\n", systemcfg->processorCount); printk("systemcfg->physicalMemorySize = 0x%lx\n", systemcfg->physicalMemorySize); - printk("systemcfg->dCacheL1LineSize = 0x%x\n", systemcfg->dCacheL1LineSize); - printk("systemcfg->iCacheL1LineSize = 0x%x\n", systemcfg->iCacheL1LineSize); + printk("ppc64_caches.dcache_line_size = 0x%x\n", + ppc64_caches.dline_size); + printk("ppc64_caches.icache_line_size = 0x%x\n", + ppc64_caches.iline_size); printk("htab_data.htab = 0x%p\n", htab_data.htab); printk("htab_data.num_ptegs = 0x%lx\n", htab_data.htab_num_ptegs); printk("-----------------------------------------------------\n"); @@ -1000,8 +1005,8 @@ * Systems with OF can look in the properties on the cpu node(s) * for a possibly more accurate value. */ - dcache_bsize = systemcfg->dCacheL1LineSize; - icache_bsize = systemcfg->iCacheL1LineSize; + dcache_bsize = ppc64_caches.dline_size; + icache_bsize = ppc64_caches.iline_size; /* reboot on panic */ panic_timeout = 180; diff -ruN linus-bk/arch/ppc64/kernel/sys_ppc32.c linus-bk-naca.1a/arch/ppc64/kernel/sys_ppc32.c --- linus-bk/arch/ppc64/kernel/sys_ppc32.c 2004-10-28 16:57:54.000000000 +1000 +++ linus-bk-naca.1a/arch/ppc64/kernel/sys_ppc32.c 2004-12-09 15:18:08.000000000 +1100 @@ -73,6 +73,7 @@ #include #include #include +#include #include "pci.h" diff -ruN linus-bk/arch/ppc64/kernel/sysfs.c linus-bk-naca.1a/arch/ppc64/kernel/sysfs.c --- linus-bk/arch/ppc64/kernel/sysfs.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.1a/arch/ppc64/kernel/sysfs.c 2004-12-09 14:54:13.000000000 +1100 @@ -13,6 +13,7 @@ #include #include #include +#include /* SMT stuff */ diff -ruN linus-bk/arch/ppc64/kernel/time.c linus-bk-naca.1a/arch/ppc64/kernel/time.c --- linus-bk/arch/ppc64/kernel/time.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.1a/arch/ppc64/kernel/time.c 2004-12-09 14:52:48.000000000 +1100 @@ -66,6 +66,7 @@ #include #include #include +#include void smp_local_timer_interrupt(struct pt_regs *); diff -ruN linus-bk/arch/ppc64/kernel/traps.c linus-bk-naca.1a/arch/ppc64/kernel/traps.c --- linus-bk/arch/ppc64/kernel/traps.c 2004-09-09 09:59:49.000000000 +1000 +++ linus-bk-naca.1a/arch/ppc64/kernel/traps.c 2004-12-09 14:49:23.000000000 +1100 @@ -37,6 +37,7 @@ #include #include #include +#include #ifdef CONFIG_PPC_PSERIES /* This is true if we are using the firmware NMI handler (typically LPAR) */ diff -ruN linus-bk/include/asm-ppc64/cache.h linus-bk-naca.1a/include/asm-ppc64/cache.h --- linus-bk/include/asm-ppc64/cache.h 2002-08-28 06:04:10.000000000 +1000 +++ linus-bk-naca.1a/include/asm-ppc64/cache.h 2004-12-09 14:51:57.000000000 +1100 @@ -7,6 +7,8 @@ #ifndef __ARCH_PPC64_CACHE_H #define __ARCH_PPC64_CACHE_H +#include + /* bytes per L1 cache line */ #define L1_CACHE_SHIFT 7 #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) @@ -14,4 +16,21 @@ #define SMP_CACHE_BYTES L1_CACHE_BYTES #define L1_CACHE_SHIFT_MAX 7 /* largest L1 which this arch supports */ +#ifndef __ASSEMBLY__ + +struct ppc64_caches { + u32 dsize; /* L1 d-cache size */ + u32 dline_size; /* L1 d-cache line size */ + u32 log_dline_size; + u32 dlines_per_page; + u32 isize; /* L1 i-cache size */ + u32 iline_size; /* L1 i-cache line size */ + u32 log_iline_size; + u32 ilines_per_page; +}; + +extern struct ppc64_caches ppc64_caches; + +#endif + #endif diff -ruN linus-bk/include/asm-ppc64/naca.h linus-bk-naca.1a/include/asm-ppc64/naca.h --- linus-bk/include/asm-ppc64/naca.h 2004-09-16 21:51:58.000000000 +1000 +++ linus-bk-naca.1a/include/asm-ppc64/naca.h 2004-12-09 14:43:33.000000000 +1100 @@ -16,11 +16,7 @@ #ifndef __ASSEMBLY__ struct naca_struct { - /*================================================================== - * Cache line 1: 0x0000 - 0x007F - * Kernel only data - undefined for user space - *================================================================== - */ + /* Kernel only data - undefined for user space */ void *xItVpdAreas; /* VPD Data 0x00 */ void *xRamDisk; /* iSeries ramdisk 0x08 */ u64 xRamDiskSize; /* In pages 0x10 */ @@ -32,12 +28,6 @@ u64 interrupt_controller; /* Type of int controller 0x40 */ u64 unused1; /* was SLB size in entries 0x48 */ u64 pftSize; /* Log 2 of page table size 0x50 */ - void *systemcfg; /* Pointer to systemcfg data 0x58 */ - u32 dCacheL1LogLineSize; /* L1 d-cache line size Log2 0x60 */ - u32 dCacheL1LinesPerPage; /* L1 d-cache lines / page 0x64 */ - u32 iCacheL1LogLineSize; /* L1 i-cache line size Log2 0x68 */ - u32 iCacheL1LinesPerPage; /* L1 i-cache lines / page 0x6c */ - u8 resv0[15]; /* Reserved 0x71 - 0x7F */ }; extern struct naca_struct *naca; diff -ruN linus-bk/include/asm-ppc64/page.h linus-bk-naca.1a/include/asm-ppc64/page.h --- linus-bk/include/asm-ppc64/page.h 2004-10-29 07:03:22.000000000 +1000 +++ linus-bk-naca.1a/include/asm-ppc64/page.h 2004-12-09 14:44:30.000000000 +1100 @@ -93,7 +93,7 @@ #ifdef __KERNEL__ #ifndef __ASSEMBLY__ -#include +#include #undef STRICT_MM_TYPECHECKS @@ -106,8 +106,8 @@ { unsigned long lines, line_size; - line_size = systemcfg->dCacheL1LineSize; - lines = naca->dCacheL1LinesPerPage; + line_size = ppc64_caches.dline_size; + lines = ppc64_caches.dlines_per_page; __asm__ __volatile__( "mtctr %1 # clear_page\n\ diff -ruN linus-bk/include/asm-ppc64/processor.h linus-bk-naca.1a/include/asm-ppc64/processor.h --- linus-bk/include/asm-ppc64/processor.h 2004-10-27 07:32:58.000000000 +1000 +++ linus-bk-naca.1a/include/asm-ppc64/processor.h 2004-12-09 15:21:04.000000000 +1100 @@ -19,6 +19,7 @@ #endif #include #include +#include /* Machine State Register (MSR) Fields */ #define MSR_SF_LG 63 /* Enable 64 bit mode */ diff -ruN linus-bk/include/asm-ppc64/systemcfg.h linus-bk-naca.1a/include/asm-ppc64/systemcfg.h --- linus-bk/include/asm-ppc64/systemcfg.h 2004-09-29 08:25:16.000000000 +1000 +++ linus-bk-naca.1a/include/asm-ppc64/systemcfg.h 2004-12-09 15:35:41.000000000 +1100 @@ -15,14 +15,6 @@ * End Change Activity */ - -#ifndef __KERNEL__ -#include -#include -#include -#include -#endif - /* * If the major version changes we are incompatible. * Minor version changes are a hint. @@ -50,10 +42,11 @@ __u64 tb_update_count; /* Timebase atomicity ctr 0x50 */ __u32 tz_minuteswest; /* Minutes west of Greenwich 0x58 */ __u32 tz_dsttime; /* Type of dst correction 0x5C */ - __u32 dCacheL1Size; /* L1 d-cache size 0x60 */ - __u32 dCacheL1LineSize; /* L1 d-cache line size 0x64 */ - __u32 iCacheL1Size; /* L1 i-cache size 0x68 */ - __u32 iCacheL1LineSize; /* L1 i-cache line size 0x6C */ + /* next four are no longer used except to be exported to /proc */ + __u32 dcache_size; /* L1 d-cache size 0x60 */ + __u32 dcache_line_size; /* L1 d-cache line size 0x64 */ + __u32 icache_size; /* L1 i-cache size 0x68 */ + __u32 icache_line_size; /* L1 i-cache line size 0x6C */ __u8 reserved0[3984]; /* Reserve rest of page 0x70 */ }; -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041209/43cf6e3e/attachment.pgp From sfr at canb.auug.org.au Thu Dec 9 17:31:35 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 9 Dec 2004 17:31:35 +1100 Subject: naca cleanups In-Reply-To: <20041209023808.3ab0a448.sfr@canb.auug.org.au> References: <20041209003800.7e38a32c.sfr@canb.auug.org.au> <20041208140859.GB32138@krispykreme.ozlabs.ibm.com> <20041209012827.2f5be3a4.sfr@canb.auug.org.au> <20041209023808.3ab0a448.sfr@canb.auug.org.au> Message-ID: <20041209173135.7f90f6c4.sfr@canb.auug.org.au> On Thu, 9 Dec 2004 02:38:08 +1100 Stephen Rothwell wrote: > > Another one for comment. This one takes usage of pftSize out of the naca. > I couldn't see any reason it should not become a global vaiable. Again, > the field is left there in case someone uses it from /proc/ppc64/naca. > > This applies on top of the previous patch. New version. This build on iSeries, pSeries and pmac. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.1a/arch/ppc64/kernel/pSeries_lpar.c linus-bk-naca.2a/arch/ppc64/kernel/pSeries_lpar.c --- linus-bk-naca.1a/arch/ppc64/kernel/pSeries_lpar.c 2004-10-28 16:57:54.000000000 +1000 +++ linus-bk-naca.2a/arch/ppc64/kernel/pSeries_lpar.c 2004-12-09 17:17:35.000000000 +1100 @@ -33,7 +33,6 @@ #include #include #include -#include #include #include #include @@ -352,7 +351,7 @@ static void pSeries_lpar_hptab_clear(void) { - unsigned long size_bytes = 1UL << naca->pftSize; + unsigned long size_bytes = 1UL << ppc64_pft_size; unsigned long hpte_count = size_bytes >> 4; unsigned long dummy1, dummy2; int i; diff -ruN linus-bk-naca.1a/arch/ppc64/kernel/prom.c linus-bk-naca.2a/arch/ppc64/kernel/prom.c --- linus-bk-naca.1a/arch/ppc64/kernel/prom.c 2004-11-26 12:08:51.000000000 +1100 +++ linus-bk-naca.2a/arch/ppc64/kernel/prom.c 2004-12-09 17:08:06.000000000 +1100 @@ -844,12 +844,12 @@ /* On LPAR, look for the first ibm,pft-size property for the hash table size */ - if (systemcfg->platform == PLATFORM_PSERIES_LPAR && naca->pftSize == 0) { + if (systemcfg->platform == PLATFORM_PSERIES_LPAR && ppc64_pft_size == 0) { u32 *pft_size; pft_size = (u32 *)get_flat_dt_prop(node, "ibm,pft-size", NULL); if (pft_size != NULL) { /* pft_size[0] is the NUMA CEC cookie */ - naca->pftSize = pft_size[1]; + ppc64_pft_size = pft_size[1]; } } @@ -1018,7 +1018,7 @@ initial_boot_params = params; /* By default, hash size is not set */ - naca->pftSize = 0; + ppc64_pft_size = 0; /* Retreive various informations from the /chosen node of the * device-tree, including the platform type, initrd location and @@ -1047,7 +1047,7 @@ /* If hash size wasn't obtained above, we calculate it now based on * the total RAM size */ - if (naca->pftSize == 0) { + if (ppc64_pft_size == 0) { unsigned long rnd_mem_size, pteg_count; /* round mem_size up to next power of 2 */ @@ -1058,10 +1058,10 @@ /* # pages / 2 */ pteg_count = (rnd_mem_size >> (12 + 1)); - naca->pftSize = __ilog2(pteg_count << 7); + ppc64_pft_size = __ilog2(pteg_count << 7); } - DBG("Hash pftSize: %x\n", (int)naca->pftSize); + DBG("Hash pftSize: %x\n", (int)ppc64_pft_size); DBG(" <- early_init_devtree()\n"); } diff -ruN linus-bk-naca.1a/arch/ppc64/kernel/setup.c linus-bk-naca.2a/arch/ppc64/kernel/setup.c --- linus-bk-naca.1a/arch/ppc64/kernel/setup.c 2004-12-09 14:46:37.000000000 +1100 +++ linus-bk-naca.2a/arch/ppc64/kernel/setup.c 2004-12-09 17:17:14.000000000 +1100 @@ -55,6 +55,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -111,6 +112,7 @@ int boot_cpuid = 0; int boot_cpuid_phys = 0; dev_t boot_dev; +u64 ppc64_pft_size; struct ppc64_caches ppc64_caches; @@ -660,7 +662,7 @@ printk("-----------------------------------------------------\n"); printk("naca = 0x%p\n", naca); - printk("naca->pftSize = 0x%lx\n", naca->pftSize); + printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); printk("naca->debug_switch = 0x%lx\n", naca->debug_switch); printk("naca->interrupt_controller = 0x%ld\n", naca->interrupt_controller); printk("systemcfg = 0x%p\n", systemcfg); diff -ruN linus-bk-naca.1a/arch/ppc64/mm/hash_utils.c linus-bk-naca.2a/arch/ppc64/mm/hash_utils.c --- linus-bk-naca.1a/arch/ppc64/mm/hash_utils.c 2004-10-29 07:03:21.000000000 +1000 +++ linus-bk-naca.2a/arch/ppc64/mm/hash_utils.c 2004-12-09 17:17:54.000000000 +1100 @@ -41,7 +41,6 @@ #include #include #include -#include #include #include #include @@ -147,7 +146,7 @@ * Calculate the required size of the htab. We want the number of * PTEGs to equal one half the number of real pages. */ - htab_size_bytes = 1UL << naca->pftSize; + htab_size_bytes = 1UL << ppc64_pft_size; pteg_count = htab_size_bytes >> 7; /* For debug, make the HTAB 1/8 as big as it normally would be. */ diff -ruN linus-bk-naca.1a/include/asm-ppc64/naca.h linus-bk-naca.2a/include/asm-ppc64/naca.h --- linus-bk-naca.1a/include/asm-ppc64/naca.h 2004-12-09 14:43:33.000000000 +1100 +++ linus-bk-naca.2a/include/asm-ppc64/naca.h 2004-12-09 17:25:40.000000000 +1100 @@ -26,8 +26,6 @@ u64 log; /* Ptr to log buffer 0x30 */ u64 serialPortAddr; /* Phy addr of serial port 0x38 */ u64 interrupt_controller; /* Type of int controller 0x40 */ - u64 unused1; /* was SLB size in entries 0x48 */ - u64 pftSize; /* Log 2 of page table size 0x50 */ }; extern struct naca_struct *naca; diff -ruN linus-bk-naca.1a/include/asm-ppc64/page.h linus-bk-naca.2a/include/asm-ppc64/page.h --- linus-bk-naca.1a/include/asm-ppc64/page.h 2004-12-09 14:44:30.000000000 +1100 +++ linus-bk-naca.2a/include/asm-ppc64/page.h 2004-12-09 17:15:33.000000000 +1100 @@ -183,6 +183,8 @@ extern int page_is_ram(unsigned long pfn); +extern u64 ppc64_pft_size; /* Log 2 of page table size */ + #endif /* __ASSEMBLY__ */ #ifdef MODULE -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041209/dabd65f7/attachment.pgp From anton at samba.org Fri Dec 10 03:07:05 2004 From: anton at samba.org (Anton Blanchard) Date: Fri, 10 Dec 2004 03:07:05 +1100 Subject: [PATCH] ppc64: pSeries shared processor fixes Message-ID: <20041209160705.GC24640@krispykreme.ozlabs.ibm.com> Hi, It turns out there are more issues with our VPA code: 1. vpa_init doesnt report errors when it fails. This was masking a bug where the VPA spanned 2 pages and phyp failed to register it. 2. We call idle_setup before we initialise the boot cpus vpa. This means we never select the shared processor idle loop. 3. We dont call vpa_init on UP kernels. I think this should go in ASAP, can people give it a once over? Signed-off-by: Anton Blanchard diff -puN arch/ppc64/kernel/smp.c~fix_vpa arch/ppc64/kernel/smp.c --- foobar2/arch/ppc64/kernel/smp.c~fix_vpa 2004-12-10 02:26:35.070363325 +1100 +++ foobar2-anton/arch/ppc64/kernel/smp.c 2004-12-10 02:26:35.113360044 +1100 @@ -76,8 +76,6 @@ extern unsigned char stab_array[]; extern int cpu_idle(void *unused); void smp_call_function_interrupt(void); -extern long register_vpa(unsigned long flags, unsigned long proc, - unsigned long vpa); int smt_enabled_at_boot = 1; diff -puN arch/ppc64/kernel/pSeries_smp.c~fix_vpa arch/ppc64/kernel/pSeries_smp.c --- foobar2/arch/ppc64/kernel/pSeries_smp.c~fix_vpa 2004-12-10 02:26:35.075362944 +1100 +++ foobar2-anton/arch/ppc64/kernel/pSeries_smp.c 2004-12-10 02:26:35.115359891 +1100 @@ -59,16 +59,6 @@ extern void pseries_secondary_smp_init(unsigned long); -static void vpa_init(int cpu) -{ - unsigned long flags, pcpu = get_hard_smp_processor_id(cpu); - - /* Register the Virtual Processor Area (VPA) */ - flags = 1UL << (63 - 18); - register_vpa(flags, pcpu, __pa((unsigned long)&(paca[cpu].lppaca))); -} - - /* Get state of physical CPU. * Return codes: * 0 - The processor is in the RTAS stopped state diff -puN arch/ppc64/kernel/setup.c~fix_vpa arch/ppc64/kernel/setup.c --- foobar2/arch/ppc64/kernel/setup.c~fix_vpa 2004-12-10 02:26:35.081362486 +1100 +++ foobar2-anton/arch/ppc64/kernel/setup.c 2004-12-10 02:26:35.118359662 +1100 @@ -1020,11 +1020,11 @@ void __init setup_arch(char **cmdline_p) /* set up the bootmem stuff with available memory */ do_init_bootmem(); + ppc_md.setup_arch(); + /* Select the correct idle loop for the platform. */ idle_setup(); - ppc_md.setup_arch(); - paging_init(); ppc64_boot_msg(0x15, "Setup Done"); } diff -puN arch/ppc64/kernel/pSeries_setup.c~fix_vpa arch/ppc64/kernel/pSeries_setup.c --- foobar2/arch/ppc64/kernel/pSeries_setup.c~fix_vpa 2004-12-10 02:26:35.085362180 +1100 +++ foobar2-anton/arch/ppc64/kernel/pSeries_setup.c 2004-12-10 02:26:35.120359510 +1100 @@ -234,6 +234,9 @@ static void __init pSeries_setup_arch(vo #endif pSeries_nvram_init(); + + if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) + vpa_init(boot_cpuid); } static int __init pSeries_init_panel(void) diff -puN include/asm-ppc64/plpar_wrappers.h~fix_vpa include/asm-ppc64/plpar_wrappers.h --- foobar2/include/asm-ppc64/plpar_wrappers.h~fix_vpa 2004-12-10 02:26:35.090361799 +1100 +++ foobar2-anton/include/asm-ppc64/plpar_wrappers.h 2004-12-10 02:26:35.121359433 +1100 @@ -22,12 +22,14 @@ static inline long cede_processor(void) return(0); } -static inline long register_vpa(unsigned long flags, unsigned long proc, unsigned long vpa) +static inline long register_vpa(unsigned long flags, unsigned long proc, + unsigned long vpa) { - plpar_hcall_norets(H_REGISTER_VPA, flags, proc, vpa); - return(0); + return plpar_hcall_norets(H_REGISTER_VPA, flags, proc, vpa); } +void vpa_init(int cpu); + static inline long plpar_pte_remove(unsigned long flags, unsigned long ptex, unsigned long avpn, diff -puN arch/ppc64/kernel/pSeries_lpar.c~fix_vpa arch/ppc64/kernel/pSeries_lpar.c --- foobar2/arch/ppc64/kernel/pSeries_lpar.c~fix_vpa 2004-12-10 02:26:35.095361417 +1100 +++ foobar2-anton/arch/ppc64/kernel/pSeries_lpar.c 2004-12-10 02:55:47.622076537 +1100 @@ -259,6 +259,22 @@ out: return found; } +void vpa_init(int cpu) +{ + int hwcpu = get_hard_smp_processor_id(cpu); + unsigned long vpa = (unsigned long)&(paca[cpu].lppaca); + long ret; + unsigned long flags; + + /* Register the Virtual Processor Area (VPA) */ + flags = 1UL << (63 - 18); + ret = register_vpa(flags, hwcpu, __pa(vpa)); + + if (ret) + printk(KERN_ERR "WARNING: vpa_init: VPA registration for " + "cpu %d (hw %d) of area %lx returns %ld\n", + cpu, hwcpu, __pa(vpa), ret); +} long pSeries_lpar_hpte_insert(unsigned long hpte_group, unsigned long va, unsigned long prpn, _ From jschopp at austin.ibm.com Fri Dec 10 04:25:22 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Thu, 09 Dec 2004 11:25:22 -0600 Subject: [PATCH] ppc64: pSeries shared processor fixes In-Reply-To: <20041209160705.GC24640@krispykreme.ozlabs.ibm.com> References: <20041209160705.GC24640@krispykreme.ozlabs.ibm.com> Message-ID: <41B88A82.20003@austin.ibm.com> > Hi, > > It turns out there are more issues with our VPA code: > > 1. vpa_init doesnt report errors when it fails. This was masking a bug > where the VPA spanned 2 pages and phyp failed to register it. > 2. We call idle_setup before we initialise the boot cpus vpa. This means > we never select the shared processor idle loop. > 3. We dont call vpa_init on UP kernels. > > I think this should go in ASAP, can people give it a once over? Wow. We really weren't ever using shared idle. Now that we actually are do you think some performance team somewhere would be interested to test that it is worthwhile? Have you tested it at all from a performance perspective? The patch looks good to me. On a somewhat related note, I never have liked that plpar_hcall_norets() actually returns status. Seems poorly named to me. -Joel > > Signed-off-by: Anton Blanchard > > diff -puN arch/ppc64/kernel/smp.c~fix_vpa arch/ppc64/kernel/smp.c > --- foobar2/arch/ppc64/kernel/smp.c~fix_vpa 2004-12-10 02:26:35.070363325 +1100 > +++ foobar2-anton/arch/ppc64/kernel/smp.c 2004-12-10 02:26:35.113360044 +1100 > @@ -76,8 +76,6 @@ extern unsigned char stab_array[]; > > extern int cpu_idle(void *unused); > void smp_call_function_interrupt(void); > -extern long register_vpa(unsigned long flags, unsigned long proc, > - unsigned long vpa); > > int smt_enabled_at_boot = 1; > > diff -puN arch/ppc64/kernel/pSeries_smp.c~fix_vpa arch/ppc64/kernel/pSeries_smp.c > --- foobar2/arch/ppc64/kernel/pSeries_smp.c~fix_vpa 2004-12-10 02:26:35.075362944 +1100 > +++ foobar2-anton/arch/ppc64/kernel/pSeries_smp.c 2004-12-10 02:26:35.115359891 +1100 > @@ -59,16 +59,6 @@ > > extern void pseries_secondary_smp_init(unsigned long); > > -static void vpa_init(int cpu) > -{ > - unsigned long flags, pcpu = get_hard_smp_processor_id(cpu); > - > - /* Register the Virtual Processor Area (VPA) */ > - flags = 1UL << (63 - 18); > - register_vpa(flags, pcpu, __pa((unsigned long)&(paca[cpu].lppaca))); > -} > - > - > /* Get state of physical CPU. > * Return codes: > * 0 - The processor is in the RTAS stopped state > diff -puN arch/ppc64/kernel/setup.c~fix_vpa arch/ppc64/kernel/setup.c > --- foobar2/arch/ppc64/kernel/setup.c~fix_vpa 2004-12-10 02:26:35.081362486 +1100 > +++ foobar2-anton/arch/ppc64/kernel/setup.c 2004-12-10 02:26:35.118359662 +1100 > @@ -1020,11 +1020,11 @@ void __init setup_arch(char **cmdline_p) > /* set up the bootmem stuff with available memory */ > do_init_bootmem(); > > + ppc_md.setup_arch(); > + > /* Select the correct idle loop for the platform. */ > idle_setup(); > > - ppc_md.setup_arch(); > - > paging_init(); > ppc64_boot_msg(0x15, "Setup Done"); > } > diff -puN arch/ppc64/kernel/pSeries_setup.c~fix_vpa arch/ppc64/kernel/pSeries_setup.c > --- foobar2/arch/ppc64/kernel/pSeries_setup.c~fix_vpa 2004-12-10 02:26:35.085362180 +1100 > +++ foobar2-anton/arch/ppc64/kernel/pSeries_setup.c 2004-12-10 02:26:35.120359510 +1100 > @@ -234,6 +234,9 @@ static void __init pSeries_setup_arch(vo > #endif > > pSeries_nvram_init(); > + > + if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) > + vpa_init(boot_cpuid); > } > > static int __init pSeries_init_panel(void) > diff -puN include/asm-ppc64/plpar_wrappers.h~fix_vpa include/asm-ppc64/plpar_wrappers.h > --- foobar2/include/asm-ppc64/plpar_wrappers.h~fix_vpa 2004-12-10 02:26:35.090361799 +1100 > +++ foobar2-anton/include/asm-ppc64/plpar_wrappers.h 2004-12-10 02:26:35.121359433 +1100 > @@ -22,12 +22,14 @@ static inline long cede_processor(void) > return(0); > } > > -static inline long register_vpa(unsigned long flags, unsigned long proc, unsigned long vpa) > +static inline long register_vpa(unsigned long flags, unsigned long proc, > + unsigned long vpa) > { > - plpar_hcall_norets(H_REGISTER_VPA, flags, proc, vpa); > - return(0); > + return plpar_hcall_norets(H_REGISTER_VPA, flags, proc, vpa); > } > > +void vpa_init(int cpu); > + > static inline long plpar_pte_remove(unsigned long flags, > unsigned long ptex, > unsigned long avpn, > diff -puN arch/ppc64/kernel/pSeries_lpar.c~fix_vpa arch/ppc64/kernel/pSeries_lpar.c > --- foobar2/arch/ppc64/kernel/pSeries_lpar.c~fix_vpa 2004-12-10 02:26:35.095361417 +1100 > +++ foobar2-anton/arch/ppc64/kernel/pSeries_lpar.c 2004-12-10 02:55:47.622076537 +1100 > @@ -259,6 +259,22 @@ out: > return found; > } > > +void vpa_init(int cpu) > +{ > + int hwcpu = get_hard_smp_processor_id(cpu); > + unsigned long vpa = (unsigned long)&(paca[cpu].lppaca); > + long ret; > + unsigned long flags; > + > + /* Register the Virtual Processor Area (VPA) */ > + flags = 1UL << (63 - 18); > + ret = register_vpa(flags, hwcpu, __pa(vpa)); > + > + if (ret) > + printk(KERN_ERR "WARNING: vpa_init: VPA registration for " > + "cpu %d (hw %d) of area %lx returns %ld\n", > + cpu, hwcpu, __pa(vpa), ret); > +} > > long pSeries_lpar_hpte_insert(unsigned long hpte_group, > unsigned long va, unsigned long prpn, > _ > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev > From nathanl at austin.ibm.com Fri Dec 10 05:03:06 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Thu, 09 Dec 2004 12:03:06 -0600 Subject: [PATCH] ppc64: pSeries shared processor fixes In-Reply-To: <41B88A82.20003@austin.ibm.com> References: <20041209160705.GC24640@krispykreme.ozlabs.ibm.com> <41B88A82.20003@austin.ibm.com> Message-ID: <1102615386.13574.10.camel@localhost.localdomain> On Thu, 2004-12-09 at 11:25 -0600, Joel Schopp wrote: > > Hi, > > > > It turns out there are more issues with our VPA code: > > > > 1. vpa_init doesnt report errors when it fails. This was masking a bug > > where the VPA spanned 2 pages and phyp failed to register it. > > 2. We call idle_setup before we initialise the boot cpus vpa. This means > > we never select the shared processor idle loop. > > 3. We dont call vpa_init on UP kernels. > > > > I think this should go in ASAP, can people give it a once over? > > Wow. We really weren't ever using shared idle. IIRC it was working around the 2.6.5 or 2.6.6 timeframe. I think it's likely that some of the code reorganizations and cleanups that have gone in since then introduced the bug. Nathan From sfr at canb.auug.org.au Fri Dec 10 15:15:47 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Fri, 10 Dec 2004 15:15:47 +1100 Subject: [PATCH][RFC] more naca cleanups In-Reply-To: <20041209155037.666233c8.sfr@canb.auug.org.au> References: <20041209003800.7e38a32c.sfr@canb.auug.org.au> <20041209155037.666233c8.sfr@canb.auug.org.au> Message-ID: <20041210151547.6c847cb5.sfr@canb.auug.org.au> Hi all, This patch moves interrupt_controller out of the naca. It applies on top of the other two. It builds on iSeries, pSeries and pmac and has been test booted on a pSeries (44P 270). -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-naca.2a/arch/ppc64/kernel/irq.c linus-bk-naca.3a/arch/ppc64/kernel/irq.c --- linus-bk-naca.2a/arch/ppc64/kernel/irq.c 2004-10-21 07:17:18.000000000 +1000 +++ linus-bk-naca.3a/arch/ppc64/kernel/irq.c 2004-12-09 02:50:34.000000000 +1100 @@ -65,6 +65,7 @@ int __irq_offset_value; int ppc_spurious_interrupts; unsigned long lpevent_count; +u64 ppc64_interrupt_controller; int show_interrupts(struct seq_file *p, void *v) { @@ -360,7 +361,7 @@ unsigned int virq, first_virq; static int warned; - if (naca->interrupt_controller == IC_OPEN_PIC) + if (ppc64_interrupt_controller == IC_OPEN_PIC) return real_irq; /* no mapping for openpic (for now) */ /* don't map interrupts < MIN_VIRT_IRQ */ diff -ruN linus-bk-naca.2a/arch/ppc64/kernel/maple_setup.c linus-bk-naca.3a/arch/ppc64/kernel/maple_setup.c --- linus-bk-naca.2a/arch/ppc64/kernel/maple_setup.c 2004-10-30 08:33:22.000000000 +1000 +++ linus-bk-naca.3a/arch/ppc64/kernel/maple_setup.c 2004-12-10 12:02:32.000000000 +1100 @@ -155,7 +155,7 @@ } /* Setup interrupt mapping options */ - naca->interrupt_controller = IC_OPEN_PIC; + ppc64_interrupt_controller = IC_OPEN_PIC; DBG(" <- maple_init_early\n"); } diff -ruN linus-bk-naca.2a/arch/ppc64/kernel/pSeries_pci.c linus-bk-naca.3a/arch/ppc64/kernel/pSeries_pci.c --- linus-bk-naca.2a/arch/ppc64/kernel/pSeries_pci.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-naca.3a/arch/ppc64/kernel/pSeries_pci.c 2004-12-09 02:58:40.000000000 +1100 @@ -353,7 +353,7 @@ unsigned int *opprop = NULL; struct device_node *root = of_find_node_by_path("/"); - if (naca->interrupt_controller == IC_OPEN_PIC) { + if (ppc64_interrupt_controller == IC_OPEN_PIC) { opprop = (unsigned int *)get_property(root, "platform-open-pic", NULL); } @@ -375,7 +375,7 @@ pci_process_bridge_OF_ranges(phb, node); pci_setup_phb_io(phb, index == 0); - if (naca->interrupt_controller == IC_OPEN_PIC && pSeries_mpic) { + if (ppc64_interrupt_controller == IC_OPEN_PIC && pSeries_mpic) { int addr = root_size_cells * (index + 2) - 1; mpic_assign_isu(pSeries_mpic, index, opprop[addr]); } diff -ruN linus-bk-naca.2a/arch/ppc64/kernel/pSeries_setup.c linus-bk-naca.3a/arch/ppc64/kernel/pSeries_setup.c --- linus-bk-naca.2a/arch/ppc64/kernel/pSeries_setup.c 2004-10-26 16:06:41.000000000 +1000 +++ linus-bk-naca.3a/arch/ppc64/kernel/pSeries_setup.c 2004-12-10 12:11:14.000000000 +1100 @@ -196,7 +196,7 @@ static void __init pSeries_setup_arch(void) { /* Fixup ppc_md depending on the type of interrupt controller */ - if (naca->interrupt_controller == IC_OPEN_PIC) { + if (ppc64_interrupt_controller == IC_OPEN_PIC) { ppc_md.init_IRQ = pSeries_init_mpic; ppc_md.get_irq = mpic_get_irq; /* Allocate the mpic now, so that find_and_init_phbs() can @@ -305,13 +305,13 @@ * to properly parse the OF interrupt tree & do the virtual irq mapping */ __irq_offset_value = NUM_ISA_INTERRUPTS; - naca->interrupt_controller = IC_INVALID; + ppc64_interrupt_controller = IC_INVALID; for (np = NULL; (np = of_find_node_by_name(np, "interrupt-controller"));) { typep = (char *)get_property(np, "compatible", NULL); if (strstr(typep, "open-pic")) - naca->interrupt_controller = IC_OPEN_PIC; + ppc64_interrupt_controller = IC_OPEN_PIC; else if (strstr(typep, "ppc-xicp")) - naca->interrupt_controller = IC_PPC_XIC; + ppc64_interrupt_controller = IC_PPC_XIC; else printk("initialize_naca: failed to recognize" " interrupt-controller\n"); diff -ruN linus-bk-naca.2a/arch/ppc64/kernel/pSeries_smp.c linus-bk-naca.3a/arch/ppc64/kernel/pSeries_smp.c --- linus-bk-naca.2a/arch/ppc64/kernel/pSeries_smp.c 2004-11-12 09:09:48.000000000 +1100 +++ linus-bk-naca.3a/arch/ppc64/kernel/pSeries_smp.c 2004-12-09 03:01:03.000000000 +1100 @@ -358,7 +358,7 @@ DBG(" -> smp_init_pSeries()\n"); - if (naca->interrupt_controller == IC_OPEN_PIC) + if (ppc64_interrupt_controller == IC_OPEN_PIC) smp_ops = &pSeries_mpic_smp_ops; else smp_ops = &pSeries_xics_smp_ops; diff -ruN linus-bk-naca.2a/arch/ppc64/kernel/pmac_setup.c linus-bk-naca.3a/arch/ppc64/kernel/pmac_setup.c --- linus-bk-naca.2a/arch/ppc64/kernel/pmac_setup.c 2004-12-09 15:24:10.000000000 +1100 +++ linus-bk-naca.3a/arch/ppc64/kernel/pmac_setup.c 2004-12-10 12:00:55.000000000 +1100 @@ -70,7 +70,6 @@ #include #include #include -#include #include "pmac.h" #include "mpic.h" @@ -316,7 +315,7 @@ } /* Setup interrupt mapping options */ - naca->interrupt_controller = IC_OPEN_PIC; + ppc64_interrupt_controller = IC_OPEN_PIC; DBG(" <- pmac_init_early\n"); } diff -ruN linus-bk-naca.2a/arch/ppc64/kernel/prom.c linus-bk-naca.3a/arch/ppc64/kernel/prom.c --- linus-bk-naca.2a/arch/ppc64/kernel/prom.c 2004-12-09 17:08:06.000000000 +1100 +++ linus-bk-naca.3a/arch/ppc64/kernel/prom.c 2004-12-10 12:04:58.000000000 +1100 @@ -44,7 +44,6 @@ #include #include #include -#include #include #include #include @@ -557,7 +556,7 @@ DBG(" -> finish_device_tree\n"); - if (naca->interrupt_controller == IC_INVALID) { + if (ppc64_interrupt_controller == IC_INVALID) { DBG("failed to configure interrupt controller type\n"); panic("failed to configure interrupt controller type\n"); } diff -ruN linus-bk-naca.2a/arch/ppc64/kernel/setup.c linus-bk-naca.3a/arch/ppc64/kerne