From jgarzik at pobox.com Fri Jul 1 00:41:03 2005 From: jgarzik at pobox.com (Jeff Garzik) Date: Thu, 30 Jun 2005 10:41:03 -0400 Subject: [RFC/PATCH 0/12] Updates & bug fixes for iseries_veth network driver In-Reply-To: <200506302016.55125.michael@ellerman.id.au> References: <200506302016.55125.michael@ellerman.id.au> Message-ID: <42C4047F.1000108@pobox.com> Michael Ellerman wrote: > Hi y'all, > > The following is a series of patches for the iseries_veth driver. > > They're not ready for merging yet, as we need to do more extensive testing. > However any feedback you have will be greatly appreciated. Note, make sure to CC me, and also the new netdev list (netdev at vger.kernel.org). Jeff From segher at kernel.crashing.org Fri Jul 1 02:41:03 2005 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Thu, 30 Jun 2005 18:41:03 +0200 Subject: mmio latency measurements In-Reply-To: <1120121818.31924.52.camel@gaston> References: <20050630080439.GD25641@sunbeam.de.gnumonks.org> <1120121818.31924.52.camel@gaston> Message-ID: <144d58d527f5e870e6a096333bd38791@kernel.crashing.org> > On ppc64, there is no cycle-counter per-se, but a HW timebase that > ticks > at a fixes frequency (independently of the CPU frequency nowadays). On a 970, and presumably on POWER4 and maybe POWER5 as well, you *can* get cycle counts -- from the performance monitor counters. Simply write 0xf00 to MMCR0 and then read the cycle count from PMC1. Segher From linas at austin.ibm.com Fri Jul 1 06:39:31 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 30 Jun 2005 15:39:31 -0500 Subject: PCI Power management (was: Re: [PATCH 4/13]: PCI Err: e100 ethernet driver recovery In-Reply-To: <20050629165828.GA73550@muc.de> References: <20050628235848.GA6376@austin.ibm.com> <1120009619.5133.228.camel@gaston> <20050629155954.GH28499@austin.ibm.com> <20050629165828.GA73550@muc.de> Message-ID: <20050630203931.GY28499@austin.ibm.com> On Wed, Jun 29, 2005 at 06:58:29PM +0200, Andi Kleen was heard to remark: > > Yep, OK. Pushig the timer would in fact break if the device was marked > > perm disabled. > > I think for network drivers you should just write a generic error handler > (perhaps in net/core/dev.c) that calls the watchdog handler. > Then all drivers could be easily converted without much code duplication. Well, there's no watchdog per-se in "struct net_device" -- are you suggesting I add one? It looks like I can almost create generic handlers for net devices; looks like calling netdev->stop() is enough to handle the error detection. However, a generic bringup would need to call pci_enable_device(), and net/core/dev.c does not include pci.h so I can't really do it there. Other than that, a generic recovry routine looks like it might be possible; I'll have to experiment; its hard to tell by reading code. This might be the wrong paradigm, though. The pci error recovery routines are *almost identical* to the power-management suspend/resume routines. From what I can tell, the only real difference is that I want to not actually turn off/on the power. Thus, the right thing to do might be to split up the struct pci_dev->suspend() and pci_dev->resume() calls into suspend() poweroff() poweron() resume() and then have the generic pci error recovery routines call suspend/resume only, skipping the poweroff-on calls. Does that sound good? I'm not sure I can pull this off without having someone from the power-management world throw a brick at me. --linas From linas at austin.ibm.com Fri Jul 1 07:07:48 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 30 Jun 2005 16:07:48 -0500 Subject: PCI Power management (was: Re: [PATCH 4/13]: PCI Err: e100 ethernet driver recovery In-Reply-To: <20050630203931.GY28499@austin.ibm.com> References: <20050628235848.GA6376@austin.ibm.com> <1120009619.5133.228.camel@gaston> <20050629155954.GH28499@austin.ibm.com> <20050629165828.GA73550@muc.de> <20050630203931.GY28499@austin.ibm.com> Message-ID: <20050630210748.GZ28499@austin.ibm.com> Hm, Scratch the idea I outline below, seems like its not a good idea. I'm reading the e100, e1000 and the ixgb power management code, and they go through all sorts of steps I don't need to do for PCI device reset. There's no clear abstraction that would serve both needs. On Thu, Jun 30, 2005 at 03:39:31PM -0500, Linas Vepstas was heard to remark: > On Wed, Jun 29, 2005 at 06:58:29PM +0200, Andi Kleen was heard to remark: > > > Yep, OK. Pushig the timer would in fact break if the device was marked > > > perm disabled. > > > > I think for network drivers you should just write a generic error handler > > (perhaps in net/core/dev.c) that calls the watchdog handler. > > Then all drivers could be easily converted without much code duplication. > > Well, there's no watchdog per-se in "struct net_device" -- are you > suggesting I add one? > > It looks like I can almost create generic handlers for net devices; > looks like calling netdev->stop() is enough to handle the error > detection. > > However, a generic bringup would need to call pci_enable_device(), > and net/core/dev.c does not include pci.h so I can't really do it > there. Other than that, a generic recovry routine looks like it might > be possible; I'll have to experiment; its hard to tell by reading code. > > This might be the wrong paradigm, though. The pci error recovery > routines are *almost identical* to the power-management suspend/resume > routines. From what I can tell, the only real difference is that > I want to not actually turn off/on the power. > > Thus, the right thing to do might be to split up the > struct pci_dev->suspend() and pci_dev->resume() calls into > > suspend() > poweroff() > poweron() > resume() > > and then have the generic pci error recovery routines call > suspend/resume only, skipping the poweroff-on calls. Does that > sound good? > > I'm not sure I can pull this off without having someone from > the power-management world throw a brick at me. > > --linas > > From benh at kernel.crashing.org Fri Jul 1 09:32:43 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 01 Jul 2005 09:32:43 +1000 Subject: PCI Power management (was: Re: [PATCH 4/13]: PCI Err: e100 ethernet driver recovery In-Reply-To: <20050630203931.GY28499@austin.ibm.com> References: <20050628235848.GA6376@austin.ibm.com> <1120009619.5133.228.camel@gaston> <20050629155954.GH28499@austin.ibm.com> <20050629165828.GA73550@muc.de> <20050630203931.GY28499@austin.ibm.com> Message-ID: <1120174364.31924.57.camel@gaston> On Thu, 2005-06-30 at 15:39 -0500, Linas Vepstas wrote: > Thus, the right thing to do might be to split up the > struct pci_dev->suspend() and pci_dev->resume() calls into > > suspend() > poweroff() > poweron() > resume() No. There are very good reasons not to do that split at the pci_dev level. > and then have the generic pci error recovery routines call > suspend/resume only, skipping the poweroff-on calls. Does that > sound good? > > I'm not sure I can pull this off without having someone from > the power-management world throw a brick at me. Just keep the error recovery callbacks for now, and we might be able to provide a generic "helper" doing the watchdog thing (yes, there is a watchdog in the net core) Ben. From michael at ellerman.id.au Fri Jul 1 21:46:14 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 1 Jul 2005 21:46:14 +1000 Subject: Make idle_loop a member of ppc_md Message-ID: <200507012146.19553.michael@ellerman.id.au> Currently the idle loop is selected in idle_setup() by consulting systemcfg->platform and with a few ifdefs as well. These five patches make idle_loop a member of the ppc_md structure, and moves the selection into the respective platforms' setup_arch(). I wrote this and then change my mind, and thought we should instead try and reduce the number of different idle loops. But that looks hard, perhaps impossible, so this might be as good as it gets. I've boot tested on iSeries and pSeries LPAR, and compiled defconfig for iSeries/pSeries/maple/G5. cheers -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050701/d7777b71/attachment.pgp From michael at ellerman.id.au Fri Jul 1 21:46:32 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 1 Jul 2005 21:46:32 +1000 Subject: [PATCH 2/5] ppc64: Move iSeries_idle() into iSeries_setup.c In-Reply-To: <200507012146.19553.michael@ellerman.id.au> Message-ID: <1120218392.289033.83615061705.qpatch@concordia> Move iSeries_idle() into iSeries_setup.c, no one else needs to know about it. Signed-off-by: Michael Ellerman --- arch/ppc64/kernel/iSeries_setup.c | 81 +++++++++++++++++++++++++++++++++++ arch/ppc64/kernel/idle.c | 86 -------------------------------------- 2 files changed, 81 insertions(+), 86 deletions(-) Index: ppc64-2.6/arch/ppc64/kernel/iSeries_setup.c =================================================================== --- ppc64-2.6.orig/arch/ppc64/kernel/iSeries_setup.c +++ ppc64-2.6/arch/ppc64/kernel/iSeries_setup.c @@ -834,6 +834,87 @@ static int __init iSeries_src_init(void) late_initcall(iSeries_src_init); +static unsigned long maxYieldTime = 0; +static unsigned long minYieldTime = 0xffffffffffffffffUL; + +static inline void process_iSeries_events(void) +{ + asm volatile ("li 0,0x5555; sc" : : : "r0", "r3"); +} + +static void yield_shared_processor(void) +{ + unsigned long tb; + unsigned long yieldTime; + + HvCall_setEnabledInterrupts(HvCall_MaskIPI | + HvCall_MaskLpEvent | + HvCall_MaskLpProd | + HvCall_MaskTimeout); + + tb = get_tb(); + /* Compute future tb value when yield should expire */ + HvCall_yieldProcessor(HvCall_YieldTimed, tb+tb_ticks_per_jiffy); + + yieldTime = get_tb() - tb; + if (yieldTime > maxYieldTime) + maxYieldTime = yieldTime; + + if (yieldTime < minYieldTime) + minYieldTime = yieldTime; + + /* + * The decrementer stops during the yield. Force a fake decrementer + * here and let the timer_interrupt code sort out the actual time. + */ + get_paca()->lppaca.int_dword.fields.decr_int = 1; + process_iSeries_events(); +} + +static int iSeries_idle(void) +{ + struct paca_struct *lpaca; + long oldval; + + /* ensure iSeries run light will be out when idle */ + ppc64_runlatch_off(); + + lpaca = get_paca(); + + while (1) { + if (lpaca->lppaca.shared_proc) { + if (hvlpevent_is_pending()) + process_iSeries_events(); + if (!need_resched()) + yield_shared_processor(); + } else { + oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); + + if (!oldval) { + set_thread_flag(TIF_POLLING_NRFLAG); + + while (!need_resched()) { + HMT_medium(); + if (hvlpevent_is_pending()) + process_iSeries_events(); + HMT_low(); + } + + HMT_medium(); + clear_thread_flag(TIF_POLLING_NRFLAG); + } else { + set_need_resched(); + } + } + + ppc64_runlatch_on(); + schedule(); + ppc64_runlatch_off(); + } + + return 0; +} + #ifndef CONFIG_PCI void __init iSeries_init_IRQ(void) { } #endif Index: ppc64-2.6/arch/ppc64/kernel/idle.c =================================================================== --- ppc64-2.6.orig/arch/ppc64/kernel/idle.c +++ ppc64-2.6/arch/ppc64/kernel/idle.c @@ -39,90 +39,6 @@ extern void power4_idle(void); static int (*idle_loop)(void); -#ifdef CONFIG_PPC_ISERIES -static unsigned long maxYieldTime = 0; -static unsigned long minYieldTime = 0xffffffffffffffffUL; - -static inline void process_iSeries_events(void) -{ - asm volatile ("li 0,0x5555; sc" : : : "r0", "r3"); -} - -static void yield_shared_processor(void) -{ - unsigned long tb; - unsigned long yieldTime; - - HvCall_setEnabledInterrupts(HvCall_MaskIPI | - HvCall_MaskLpEvent | - HvCall_MaskLpProd | - HvCall_MaskTimeout); - - tb = get_tb(); - /* Compute future tb value when yield should expire */ - HvCall_yieldProcessor(HvCall_YieldTimed, tb+tb_ticks_per_jiffy); - - yieldTime = get_tb() - tb; - if (yieldTime > maxYieldTime) - maxYieldTime = yieldTime; - - if (yieldTime < minYieldTime) - minYieldTime = yieldTime; - - /* - * The decrementer stops during the yield. Force a fake decrementer - * here and let the timer_interrupt code sort out the actual time. - */ - get_paca()->lppaca.int_dword.fields.decr_int = 1; - process_iSeries_events(); -} - -static int iSeries_idle(void) -{ - struct paca_struct *lpaca; - long oldval; - - /* ensure iSeries run light will be out when idle */ - ppc64_runlatch_off(); - - lpaca = get_paca(); - - while (1) { - if (lpaca->lppaca.shared_proc) { - if (hvlpevent_is_pending()) - process_iSeries_events(); - if (!need_resched()) - yield_shared_processor(); - } else { - oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); - - if (!oldval) { - set_thread_flag(TIF_POLLING_NRFLAG); - - while (!need_resched()) { - HMT_medium(); - if (hvlpevent_is_pending()) - process_iSeries_events(); - HMT_low(); - } - - HMT_medium(); - clear_thread_flag(TIF_POLLING_NRFLAG); - } else { - set_need_resched(); - } - } - - ppc64_runlatch_on(); - schedule(); - ppc64_runlatch_off(); - } - - return 0; -} - -#else - int default_idle(void) { long oldval; @@ -305,8 +221,6 @@ int native_idle(void) return 0; } -#endif /* CONFIG_PPC_ISERIES */ - void cpu_idle(void) { BUG_ON(NULL == ppc_md.idle_loop); From michael at ellerman.id.au Fri Jul 1 21:46:32 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 1 Jul 2005 21:46:32 +1000 Subject: [PATCH 1/5] ppc64: Make idle_loop a ppc_md function In-Reply-To: <200507012146.19553.michael@ellerman.id.au> Message-ID: <1120218392.215165.357899678992.qpatch@concordia> This patch adds an idle member to the ppc_md structure and calls it from cpu_idle(). If a platform leaves ppc_md.idle as null it will get the default idle loop default_idle(). Signed-off-by: Michael Ellerman --- arch/ppc64/kernel/idle.c | 8 +++++--- arch/ppc64/kernel/setup.c | 6 +++--- include/asm-ppc64/machdep.h | 5 +++++ 3 files changed, 13 insertions(+), 6 deletions(-) Index: ppc64-2.6/include/asm-ppc64/machdep.h =================================================================== --- ppc64-2.6.orig/include/asm-ppc64/machdep.h +++ ppc64-2.6/include/asm-ppc64/machdep.h @@ -140,8 +140,13 @@ struct machdep_calls { unsigned long size, pgprot_t vma_prot); + /* Idle loop for this platform, leave empty for default idle loop */ + int (*idle_loop)(void); }; +extern int default_idle(void); +extern int native_idle(void); + extern struct machdep_calls ppc_md; extern char cmd_line[COMMAND_LINE_SIZE]; Index: ppc64-2.6/arch/ppc64/kernel/setup.c =================================================================== --- ppc64-2.6.orig/arch/ppc64/kernel/setup.c +++ ppc64-2.6/arch/ppc64/kernel/setup.c @@ -96,7 +96,6 @@ extern void udbg_init_maple_realmode(voi extern unsigned long klimit; extern void mm_init_ppc64(void); -extern int idle_setup(void); extern void stab_initialize(unsigned long stab); extern void htab_initialize(void); extern void early_init_devtree(void *flat_dt); @@ -1081,8 +1080,9 @@ void __init setup_arch(char **cmdline_p) ppc_md.setup_arch(); - /* Select the correct idle loop for the platform. */ - idle_setup(); + /* Use the default idle loop if the platform hasn't provided one. */ + if (NULL == ppc_md.idle_loop) + ppc_md.idle_loop = default_idle; paging_init(); ppc64_boot_msg(0x15, "Setup Done"); Index: ppc64-2.6/arch/ppc64/kernel/idle.c =================================================================== --- ppc64-2.6.orig/arch/ppc64/kernel/idle.c +++ ppc64-2.6/arch/ppc64/kernel/idle.c @@ -33,6 +33,7 @@ #include #include #include +#include extern void power4_idle(void); @@ -122,7 +123,7 @@ static int iSeries_idle(void) #else -static int default_idle(void) +int default_idle(void) { long oldval; unsigned int cpu = smp_processor_id(); @@ -288,7 +289,7 @@ static int shared_idle(void) #endif /* CONFIG_PPC_PSERIES */ -static int native_idle(void) +int native_idle(void) { while(1) { /* check CPU type here */ @@ -308,7 +309,8 @@ static int native_idle(void) void cpu_idle(void) { - idle_loop(); + BUG_ON(NULL == ppc_md.idle_loop); + ppc_md.idle_loop(); } int powersave_nap; From michael at ellerman.id.au Fri Jul 1 21:46:32 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 1 Jul 2005 21:46:32 +1000 Subject: [PATCH 5/5] ppc64: Remove obsolete idle_setup() In-Reply-To: <200507012146.19553.michael@ellerman.id.au> Message-ID: <1120218392.499521.155402682754.qpatch@concordia> Now that the idle loop is configured by each platform we don't need idle_setup() anymore. Signed-off-by: Michael Ellerman --- arch/ppc64/kernel/idle.c | 41 ----------------------------------------- 1 files changed, 41 deletions(-) Index: ppc64-2.6/arch/ppc64/kernel/idle.c =================================================================== --- ppc64-2.6.orig/arch/ppc64/kernel/idle.c +++ ppc64-2.6/arch/ppc64/kernel/idle.c @@ -37,8 +37,6 @@ extern void power4_idle(void); -static int (*idle_loop)(void); - int default_idle(void) { long oldval; @@ -127,42 +125,3 @@ register_powersave_nap_sysctl(void) } __initcall(register_powersave_nap_sysctl); #endif - -int idle_setup(void) -{ - /* - * Move that junk to each platform specific file, eventually define - * a pSeries_idle for shared processor stuff - */ -#ifdef CONFIG_PPC_ISERIES - idle_loop = iSeries_idle; - return 1; -#else - idle_loop = default_idle; -#endif -#ifdef CONFIG_PPC_PSERIES - if (systemcfg->platform & PLATFORM_PSERIES) { - if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { - if (get_paca()->lppaca.shared_proc) { - printk(KERN_INFO "Using shared processor idle loop\n"); - idle_loop = shared_idle; - } else { - printk(KERN_INFO "Using dedicated idle loop\n"); - idle_loop = dedicated_idle; - } - } else { - printk(KERN_INFO "Using default idle loop\n"); - idle_loop = default_idle; - } - } -#endif /* CONFIG_PPC_PSERIES */ -#ifndef CONFIG_PPC_ISERIES - if (systemcfg->platform == PLATFORM_POWERMAC || - systemcfg->platform == PLATFORM_MAPLE) { - printk(KERN_INFO "Using native/NAP idle loop\n"); - idle_loop = native_idle; - } -#endif /* CONFIG_PPC_ISERIES */ - - return 1; -} From michael at ellerman.id.au Fri Jul 1 21:46:32 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 1 Jul 2005 21:46:32 +1000 Subject: [PATCH 4/5] ppc64: Fixup platforms for new ppc_md.idle In-Reply-To: <200507012146.19553.michael@ellerman.id.au> Message-ID: <1120218392.425320.222568985943.qpatch@concordia> This patch fixes up iSeries, pSeries, pmac and maple to set the correct idle function for each platform. Signed-off-by: Michael Ellerman --- arch/ppc64/kernel/iSeries_setup.c | 1 + arch/ppc64/kernel/maple_setup.c | 3 +++ arch/ppc64/kernel/pSeries_setup.c | 18 ++++++++++++++++++ arch/ppc64/kernel/pmac_setup.c | 5 ++++- 4 files changed, 26 insertions(+), 1 deletion(-) Index: ppc64-2.6/arch/ppc64/kernel/iSeries_setup.c =================================================================== --- ppc64-2.6.orig/arch/ppc64/kernel/iSeries_setup.c +++ ppc64-2.6/arch/ppc64/kernel/iSeries_setup.c @@ -940,5 +940,6 @@ void __init iSeries_early_setup(void) ppc_md.get_rtc_time = iSeries_get_rtc_time; ppc_md.calibrate_decr = iSeries_calibrate_decr; ppc_md.progress = iSeries_progress; + ppc_md.idle_loop = iSeries_idle; } Index: ppc64-2.6/arch/ppc64/kernel/maple_setup.c =================================================================== --- ppc64-2.6.orig/arch/ppc64/kernel/maple_setup.c +++ ppc64-2.6/arch/ppc64/kernel/maple_setup.c @@ -177,6 +177,8 @@ void __init maple_setup_arch(void) #ifdef CONFIG_DUMMY_CONSOLE conswitchp = &dummy_con; #endif + + printk(KERN_INFO "Using native/NAP idle loop\n"); } /* @@ -297,4 +299,5 @@ struct machdep_calls __initdata maple_md .get_rtc_time = maple_get_rtc_time, .calibrate_decr = generic_calibrate_decr, .progress = maple_progress, + .idle_loop = native_idle, }; Index: ppc64-2.6/arch/ppc64/kernel/pmac_setup.c =================================================================== --- ppc64-2.6.orig/arch/ppc64/kernel/pmac_setup.c +++ ppc64-2.6/arch/ppc64/kernel/pmac_setup.c @@ -186,6 +186,8 @@ void __init pmac_setup_arch(void) #ifdef CONFIG_DUMMY_CONSOLE conswitchp = &dummy_con; #endif + + printk(KERN_INFO "Using native/NAP idle loop\n"); } #ifdef CONFIG_SCSI @@ -507,5 +509,6 @@ struct machdep_calls __initdata pmac_md .calibrate_decr = pmac_calibrate_decr, .feature_call = pmac_do_feature_call, .progress = pmac_progress, - .check_legacy_ioport = pmac_check_legacy_ioport + .check_legacy_ioport = pmac_check_legacy_ioport, + .idle_loop = native_idle, }; Index: ppc64-2.6/arch/ppc64/kernel/pSeries_setup.c =================================================================== --- ppc64-2.6.orig/arch/ppc64/kernel/pSeries_setup.c +++ ppc64-2.6/arch/ppc64/kernel/pSeries_setup.c @@ -19,6 +19,7 @@ #undef DEBUG #include +#include #include #include #include @@ -82,6 +83,9 @@ int fwnmi_active; /* TRUE if an FWNMI h extern void pSeries_system_reset_exception(struct pt_regs *regs); extern int pSeries_machine_check_exception(struct pt_regs *regs); +static int shared_idle(void); +static int dedicated_idle(void); + static volatile void __iomem * chrp_int_ack_special; struct mpic *pSeries_mpic; @@ -229,6 +233,20 @@ static void __init pSeries_setup_arch(vo if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) vpa_init(boot_cpuid); + + /* Choose an idle loop */ + if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { + if (get_paca()->lppaca.shared_proc) { + printk(KERN_INFO "Using shared processor idle loop\n"); + ppc_md.idle_loop = shared_idle; + } else { + printk(KERN_INFO "Using dedicated idle loop\n"); + ppc_md.idle_loop = dedicated_idle; + } + } else { + printk(KERN_INFO "Using default idle loop\n"); + ppc_md.idle_loop = default_idle; + } } static int __init pSeries_init_panel(void) From michael at ellerman.id.au Fri Jul 1 21:46:32 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 1 Jul 2005 21:46:32 +1000 Subject: [PATCH 3/5] ppc64: Move pSeries idle functions into pSeries_setup.c In-Reply-To: <200507012146.19553.michael@ellerman.id.au> Message-ID: <1120218392.355354.309061134660.qpatch@concordia> dedicated_idle() and shared_idle() are only used by pSeries, so move them into pSeries_setup.c Signed-off-by: Michael Ellerman --- arch/ppc64/kernel/idle.c | 131 -------------------------------------- arch/ppc64/kernel/pSeries_setup.c | 127 ++++++++++++++++++++++++++++++++++++ 2 files changed, 127 insertions(+), 131 deletions(-) Index: ppc64-2.6/arch/ppc64/kernel/idle.c =================================================================== --- ppc64-2.6.orig/arch/ppc64/kernel/idle.c +++ ppc64-2.6/arch/ppc64/kernel/idle.c @@ -74,137 +74,6 @@ int default_idle(void) return 0; } -#ifdef CONFIG_PPC_PSERIES - -DECLARE_PER_CPU(unsigned long, smt_snooze_delay); - -int dedicated_idle(void) -{ - long oldval; - struct paca_struct *lpaca = get_paca(), *ppaca; - unsigned long start_snooze; - unsigned long *smt_snooze_delay = &__get_cpu_var(smt_snooze_delay); - unsigned int cpu = smp_processor_id(); - - ppaca = &paca[cpu ^ 1]; - - while (1) { - /* - * Indicate to the HV that we are idle. Now would be - * a good time to find other work to dispatch. - */ - lpaca->lppaca.idle = 1; - - oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); - if (!oldval) { - set_thread_flag(TIF_POLLING_NRFLAG); - start_snooze = __get_tb() + - *smt_snooze_delay * tb_ticks_per_usec; - while (!need_resched() && !cpu_is_offline(cpu)) { - /* - * Go into low thread priority and possibly - * low power mode. - */ - HMT_low(); - HMT_very_low(); - - if (*smt_snooze_delay == 0 || - __get_tb() < start_snooze) - continue; - - HMT_medium(); - - if (!(ppaca->lppaca.idle)) { - local_irq_disable(); - - /* - * We are about to sleep the thread - * and so wont be polling any - * more. - */ - clear_thread_flag(TIF_POLLING_NRFLAG); - - /* - * SMT dynamic mode. Cede will result - * in this thread going dormant, if the - * partner thread is still doing work. - * Thread wakes up if partner goes idle, - * an interrupt is presented, or a prod - * occurs. Returning from the cede - * enables external interrupts. - */ - if (!need_resched()) - cede_processor(); - else - local_irq_enable(); - } else { - /* - * Give the HV an opportunity at the - * processor, since we are not doing - * any work. - */ - poll_pending(); - } - } - - clear_thread_flag(TIF_POLLING_NRFLAG); - } else { - set_need_resched(); - } - - HMT_medium(); - lpaca->lppaca.idle = 0; - schedule(); - if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING) - cpu_die(); - } - return 0; -} - -static int shared_idle(void) -{ - struct paca_struct *lpaca = get_paca(); - unsigned int cpu = smp_processor_id(); - - while (1) { - /* - * Indicate to the HV that we are idle. Now would be - * a good time to find other work to dispatch. - */ - lpaca->lppaca.idle = 1; - - while (!need_resched() && !cpu_is_offline(cpu)) { - local_irq_disable(); - - /* - * Yield the processor to the hypervisor. We return if - * an external interrupt occurs (which are driven prior - * to returning here) or if a prod occurs from another - * processor. When returning here, external interrupts - * are enabled. - * - * Check need_resched() again with interrupts disabled - * to avoid a race. - */ - if (!need_resched()) - cede_processor(); - else - local_irq_enable(); - } - - HMT_medium(); - lpaca->lppaca.idle = 0; - schedule(); - if (cpu_is_offline(smp_processor_id()) && - system_state == SYSTEM_RUNNING) - cpu_die(); - } - - return 0; -} - -#endif /* CONFIG_PPC_PSERIES */ - int native_idle(void) { while(1) { Index: ppc64-2.6/arch/ppc64/kernel/pSeries_setup.c =================================================================== --- ppc64-2.6.orig/arch/ppc64/kernel/pSeries_setup.c +++ ppc64-2.6/arch/ppc64/kernel/pSeries_setup.c @@ -418,6 +418,133 @@ static int __init pSeries_probe(int plat return 1; } +DECLARE_PER_CPU(unsigned long, smt_snooze_delay); + +int dedicated_idle(void) +{ + long oldval; + struct paca_struct *lpaca = get_paca(), *ppaca; + unsigned long start_snooze; + unsigned long *smt_snooze_delay = &__get_cpu_var(smt_snooze_delay); + unsigned int cpu = smp_processor_id(); + + ppaca = &paca[cpu ^ 1]; + + while (1) { + /* + * Indicate to the HV that we are idle. Now would be + * a good time to find other work to dispatch. + */ + lpaca->lppaca.idle = 1; + + oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); + if (!oldval) { + set_thread_flag(TIF_POLLING_NRFLAG); + start_snooze = __get_tb() + + *smt_snooze_delay * tb_ticks_per_usec; + while (!need_resched() && !cpu_is_offline(cpu)) { + /* + * Go into low thread priority and possibly + * low power mode. + */ + HMT_low(); + HMT_very_low(); + + if (*smt_snooze_delay == 0 || + __get_tb() < start_snooze) + continue; + + HMT_medium(); + + if (!(ppaca->lppaca.idle)) { + local_irq_disable(); + + /* + * We are about to sleep the thread + * and so wont be polling any + * more. + */ + clear_thread_flag(TIF_POLLING_NRFLAG); + + /* + * SMT dynamic mode. Cede will result + * in this thread going dormant, if the + * partner thread is still doing work. + * Thread wakes up if partner goes idle, + * an interrupt is presented, or a prod + * occurs. Returning from the cede + * enables external interrupts. + */ + if (!need_resched()) + cede_processor(); + else + local_irq_enable(); + } else { + /* + * Give the HV an opportunity at the + * processor, since we are not doing + * any work. + */ + poll_pending(); + } + } + + clear_thread_flag(TIF_POLLING_NRFLAG); + } else { + set_need_resched(); + } + + HMT_medium(); + lpaca->lppaca.idle = 0; + schedule(); + if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING) + cpu_die(); + } + return 0; +} + +static int shared_idle(void) +{ + struct paca_struct *lpaca = get_paca(); + unsigned int cpu = smp_processor_id(); + + while (1) { + /* + * Indicate to the HV that we are idle. Now would be + * a good time to find other work to dispatch. + */ + lpaca->lppaca.idle = 1; + + while (!need_resched() && !cpu_is_offline(cpu)) { + local_irq_disable(); + + /* + * Yield the processor to the hypervisor. We return if + * an external interrupt occurs (which are driven prior + * to returning here) or if a prod occurs from another + * processor. When returning here, external interrupts + * are enabled. + * + * Check need_resched() again with interrupts disabled + * to avoid a race. + */ + if (!need_resched()) + cede_processor(); + else + local_irq_enable(); + } + + HMT_medium(); + lpaca->lppaca.idle = 0; + schedule(); + if (cpu_is_offline(smp_processor_id()) && + system_state == SYSTEM_RUNNING) + cpu_die(); + } + + return 0; +} + struct machdep_calls __initdata pSeries_md = { .probe = pSeries_probe, .setup_arch = pSeries_setup_arch, From kernel at 0x100.com Sat Jul 2 00:09:59 2005 From: kernel at 0x100.com (Yuta SATOH) Date: Fri, 01 Jul 2005 23:09:59 +0900 Subject: Brand new iMac G5 In-Reply-To: <1118871572.5986.231.camel@gaston> References: <1118871572.5986.231.camel@gaston> Message-ID: <20050701230521.4F4D.KERNEL@0x100.com> Hello, I received the report that the network device of a brand new iMacG5 functioned on the kernel which applied your patch. [1] If possible, please merge it into a kernel. Thank you. [1] http://bugs.gentoo.org/94263 Benjamin Herrenschmidt wrote: > Ok, the patch was missing a bit, here's a fixed version > > Index: linux-work/drivers/net/sungem.c > =================================================================== > --- linux-work.orig/drivers/net/sungem.c 2005-05-02 10:48:28.000000000 +1000 > +++ linux-work/drivers/net/sungem.c 2005-06-14 10:17:38.000000000 +1000 > @@ -3078,7 +3078,9 @@ > gp->phy_mii.dev = dev; > gp->phy_mii.mdio_read = _phy_read; > gp->phy_mii.mdio_write = _phy_write; > - > +#ifdef CONFIG_PPC_PMAC > + gp->phy_mii.platform_data = gp->of_node; > +#endif > /* By default, we start with autoneg */ > gp->want_autoneg = 1; > > Index: linux-work/drivers/net/sungem_phy.c > =================================================================== > --- linux-work.orig/drivers/net/sungem_phy.c 2005-05-02 10:48:28.000000000 +1000 > +++ linux-work/drivers/net/sungem_phy.c 2005-06-16 07:38:37.000000000 +1000 > @@ -32,6 +32,10 @@ > #include > #include > > +#ifdef CONFIG_PPC_PMAC > +#include > +#endif > + > #include "sungem_phy.h" > > /* Link modes of the BCM5400 PHY */ > @@ -281,10 +285,12 @@ > static int bcm5421_init(struct mii_phy* phy) > { > u16 data; > - int rev; > + unsigned int id; > > - rev = phy_read(phy, MII_PHYSID2) & 0x000f; > - if (rev == 0) { > + id = (phy_read(phy, MII_PHYSID1) << 16 | phy_read(phy, MII_PHYSID2)); > + > + /* Revision 0 of 5421 needs some fixups */ > + if (id == 0x002060e0) { > /* This is borrowed from MacOS > */ > phy_write(phy, 0x18, 0x1007); > @@ -297,21 +303,28 @@ > data = phy_read(phy, 0x15); > phy_write(phy, 0x15, data | 0x0200); > } > -#if 0 > - /* This has to be verified before I enable it */ > - /* Enable automatic low-power */ > - phy_write(phy, 0x1c, 0x9002); > - phy_write(phy, 0x1c, 0xa821); > - phy_write(phy, 0x1c, 0x941d); > -#endif > - return 0; > -} > > -static int bcm5421k2_init(struct mii_phy* phy) > -{ > - /* Init code borrowed from OF */ > - phy_write(phy, 4, 0x01e1); > - phy_write(phy, 9, 0x0300); > + /* Pick up some init code from OF for K2 version */ > + if ((id & 0xfffffff0) == 0x002062e0) { > + phy_write(phy, 4, 0x01e1); > + phy_write(phy, 9, 0x0300); > + } > + > + /* Check if we can enable automatic low power */ > +#ifdef CONFIG_PPC_PMAC > + if (phy->platform_data) { > + struct device_node *np = of_get_parent(phy->platform_data); > + int can_low_power = 1; > + if (np == NULL || get_property(np, "no-autolowpower", NULL)) > + can_low_power = 0; > + if (can_low_power) { > + /* Enable automatic low-power */ > + phy_write(phy, 0x1c, 0x9002); > + phy_write(phy, 0x1c, 0xa821); > + phy_write(phy, 0x1c, 0x941d); > + } > + } > +#endif /* CONFIG_PPC_PMAC */ > > return 0; > } > @@ -762,7 +775,7 @@ > > /* Broadcom BCM 5421 built-in K2 */ > static struct mii_phy_ops bcm5421k2_phy_ops = { > - .init = bcm5421k2_init, > + .init = bcm5421_init, > .suspend = bcm5411_suspend, > .setup_aneg = bcm54xx_setup_aneg, > .setup_forced = bcm54xx_setup_forced, > @@ -779,6 +792,25 @@ > .ops = &bcm5421k2_phy_ops > }; > > +/* Broadcom BCM 5462 built-in Vesta */ > +static struct mii_phy_ops bcm5462V_phy_ops = { > + .init = bcm5421_init, > + .suspend = bcm5411_suspend, > + .setup_aneg = bcm54xx_setup_aneg, > + .setup_forced = bcm54xx_setup_forced, > + .poll_link = genmii_poll_link, > + .read_link = bcm54xx_read_link, > +}; > + > +static struct mii_phy_def bcm5462V_phy_def = { > + .phy_id = 0x002060d0, > + .phy_id_mask = 0xfffffff0, > + .name = "BCM5462-Vesta", > + .features = MII_GBIT_FEATURES, > + .magic_aneg = 1, > + .ops = &bcm5462V_phy_ops > +}; > + > /* Marvell 88E1101 (Apple seem to deal with 2 different revs, > * I masked out the 8 last bits to get both, but some specs > * would be useful here) --BenH. > @@ -824,6 +856,7 @@ > &bcm5411_phy_def, > &bcm5421_phy_def, > &bcm5421k2_phy_def, > + &bcm5462V_phy_def, > &marvell_phy_def, > &genmii_phy_def, > NULL > Index: linux-work/drivers/net/sungem_phy.h > =================================================================== > --- linux-work.orig/drivers/net/sungem_phy.h 2005-05-02 10:48:28.000000000 +1000 > +++ linux-work/drivers/net/sungem_phy.h 2005-06-14 10:16:14.000000000 +1000 > @@ -43,9 +43,10 @@ > int pause; > > /* Provided by host chip */ > - struct net_device* dev; > + struct net_device *dev; > int (*mdio_read) (struct net_device *dev, int mii_id, int reg); > void (*mdio_write) (struct net_device *dev, int mii_id, int reg, int val); > + void *platform_data; > }; > > /* Pass in a struct mii_phy with dev, mdio_read and mdio_write -- Yuta SATOH From service at paypal.com Sat Jul 2 06:08:25 2005 From: service at paypal.com (PayPal) Date: Fri, 01 Jul 2005 16:08:25 -0400 Subject: Update Account Information Message-ID: An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050701/0c6bfc15/attachment.htm From benh at kernel.crashing.org Sat Jul 2 09:51:03 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 02 Jul 2005 09:51:03 +1000 Subject: Brand new iMac G5 In-Reply-To: <20050701230521.4F4D.KERNEL@0x100.com> References: <1118871572.5986.231.camel@gaston> <20050701230521.4F4D.KERNEL@0x100.com> Message-ID: <1120261863.31924.115.camel@gaston> On Fri, 2005-07-01 at 23:09 +0900, Yuta SATOH wrote: > Hello, > > I received the report that the network device of a brand new iMacG5 > functioned on the kernel which applied your patch. [1] > If possible, please merge it into a kernel. It has been sent upstream already Ben. From grundler at parisc-linux.org Sat Jul 2 18:21:29 2005 From: grundler at parisc-linux.org (Grant Grundler) Date: Sat, 2 Jul 2005 02:21:29 -0600 Subject: [PATCH 7/13]: PCI Err: Symbios SCSI driver recovery In-Reply-To: <20050629163408.GI28499@austin.ibm.com> References: <20050628235919.GA6415@austin.ibm.com> <20050629030237.GB71992@muc.de> <20050629163408.GI28499@austin.ibm.com> Message-ID: <20050702082129.GD14091@colo.lackof.org> On Wed, Jun 29, 2005 at 11:34:08AM -0500, Linas Vepstas wrote: ... > requests get replayed, in a fashion similar to what would be needed > after a host reset. In particular, there shouldn't be and (permanent) > file system corruption because any inconsistent state on the disk > would get over-written when the queued reqeusts get re-issued. FS's that require some ordering (journal) should be handling this sort of stuff already. I have the same expectations as Linas does WRT design. FS's that don't, will have the same sort of problems that they would have as if the OS crashed. > FWIW, yes, I have heard of devices that "cheat", and report back that a > transaction is complete, even though it is still pending in firmware > somewhere, either on the host or the disk. Those devices get screwed. See "Write Cache Enabled" (aka WCE or in HPUX speak "Immediate Reporting"). WCE must be disabled if data corruption can not be tolerated. "Desktop" (ie unix workstations) systems typically have WCE enabled so they look good on (stupid) performance benchmarks. The only devices that lie about WCE have battery backed RAM buffers. (e.g. SCSI RAID *devices* - multi-LUN, dual controller beasts) > No doubt, this will happen to some giant banking customer, It won't happen because of WCE. None of the major HW vendors will sell or support HW with WCE enabled. Exactly for the reasons you point out. grant From olh at suse.de Mon Jul 4 22:02:44 2005 From: olh at suse.de (Olaf Hering) Date: Mon, 4 Jul 2005 14:02:44 +0200 Subject: [PATCH] vdso32, fix link errors after recent toolchain changes Message-ID: <20050704120244.GA10377@suse.de> Patch from amodra at bigpond.net.au, http://sources.redhat.com/bugzilla/show_bug.cgi?id=1042 /usr/bin/ld: arch/ppc64/kernel/vdso32/vdso32.so: The first section in the PT_DYNAMIC segment is not the .dynamic section Signed-off-by: Olaf Hering arch/ppc64/kernel/vdso32/vdso32.lds.S | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.12/arch/ppc64/kernel/vdso32/vdso32.lds.S =================================================================== --- linux-2.6.12.orig/arch/ppc64/kernel/vdso32/vdso32.lds.S +++ linux-2.6.12/arch/ppc64/kernel/vdso32/vdso32.lds.S @@ -40,9 +40,9 @@ SECTIONS .gcc_except_table : { *(.gcc_except_table) } .fixup : { *(.fixup) } - .got ALIGN(4) : { *(.got.plt) *(.got) } - .dynamic : { *(.dynamic) } :text :dynamic + .got : { *(.got) } + .plt : { *(.plt) } _end = .; __end = .; From mostrows at watson.ibm.com Tue Jul 5 09:36:52 2005 From: mostrows at watson.ibm.com (Michal Ostrowski) Date: Mon, 4 Jul 2005 19:36:52 -0400 Subject: [PATCH] Externally visible buffer for CONFIG_CMDLINE Message-ID: <20050704193652.23980d26@brick.watson.ibm.com> Define a fixed buffer to store the CONFIG_CMDLINE string and the buffer in it's own section. This allows for one to easily locate this buffer in the vmlinux file (using objdump) and then use dd to change the command line. (Allows one to avoid re-building everything to change the command line when using hardware where the only command line is the built-in one.) --- Signed-off-by: Michal Ostrowski 0) strlcpy(cmd_line, p, min(l, COMMAND_LINE_SIZE)); } + #ifdef CONFIG_CMDLINE - if (l == 0 || (l == 1 && (*p) == 0)) - strlcpy(cmd_line, CONFIG_CMDLINE, COMMAND_LINE_SIZE); -#endif /* CONFIG_CMDLINE */ + if (l == 0 || (l == 1 && (*p) == 0)) { + strlcpy(cmd_line, builtin_cmdline, sizeof(builtin_cmdline)); + } +#endif DBG("Command line is: %s\n", cmd_line); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050704/21299202/attachment.pgp From anton at samba.org Wed Jul 6 02:23:40 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 6 Jul 2005 02:23:40 +1000 Subject: [PATCH] ppc64: use c99 initialisers in cputable code Message-ID: <20050705162340.GH5384@krispykreme> Use c99 initialisers in the cputable code. Signed-off-by: Anton Blanchard Index: linux-2.6.git-work/arch/ppc64/kernel/cputable.c =================================================================== --- linux-2.6.git-work.orig/arch/ppc64/kernel/cputable.c 2005-07-03 10:41:00.000000000 +1000 +++ linux-2.6.git-work/arch/ppc64/kernel/cputable.c 2005-07-03 11:15:43.000000000 +1000 @@ -49,160 +49,219 @@ #endif struct cpu_spec cpu_specs[] = { - { /* Power3 */ - 0xffff0000, 0x00400000, "POWER3 (630)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8, - COMMON_USER_PPC64, - 128, 128, - __setup_cpu_power3, - COMMON_PPC64_FW - }, - { /* Power3+ */ - 0xffff0000, 0x00410000, "POWER3 (630+)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8, - COMMON_USER_PPC64, - 128, 128, - __setup_cpu_power3, - COMMON_PPC64_FW - }, - { /* Northstar */ - 0xffff0000, 0x00330000, "RS64-II (northstar)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, - 128, 128, - __setup_cpu_power3, - COMMON_PPC64_FW - }, - { /* Pulsar */ - 0xffff0000, 0x00340000, "RS64-III (pulsar)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, - 128, 128, - __setup_cpu_power3, - COMMON_PPC64_FW - }, - { /* I-star */ - 0xffff0000, 0x00360000, "RS64-III (icestar)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, - 128, 128, - __setup_cpu_power3, - COMMON_PPC64_FW - }, - { /* S-star */ - 0xffff0000, 0x00370000, "RS64-IV (sstar)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, - 128, 128, - __setup_cpu_power3, - COMMON_PPC64_FW - }, - { /* Power4 */ - 0xffff0000, 0x00350000, "POWER4 (gp)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, - 128, 128, - __setup_cpu_power4, - COMMON_PPC64_FW - }, - { /* Power4+ */ - 0xffff0000, 0x00380000, "POWER4+ (gq)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, - 128, 128, - __setup_cpu_power4, - COMMON_PPC64_FW - }, - { /* PPC970 */ - 0xffff0000, 0x00390000, "PPC970", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_ALTIVEC_COMP | - CPU_FTR_CAN_NAP | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64 | PPC_FEATURE_HAS_ALTIVEC_COMP, - 128, 128, - __setup_cpu_ppc970, - COMMON_PPC64_FW - }, - { /* PPC970FX */ - 0xffff0000, 0x003c0000, "PPC970FX", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_ALTIVEC_COMP | - CPU_FTR_CAN_NAP | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64 | PPC_FEATURE_HAS_ALTIVEC_COMP, - 128, 128, - __setup_cpu_ppc970, - COMMON_PPC64_FW - }, - { /* Power5 */ - 0xffff0000, 0x003a0000, "POWER5 (gr)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_MMCRA | CPU_FTR_SMT | - CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | - CPU_FTR_MMCRA_SIHV, - COMMON_USER_PPC64, - 128, 128, - __setup_cpu_power4, - COMMON_PPC64_FW - }, - { /* Power5 */ - 0xffff0000, 0x003b0000, "POWER5 (gs)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_MMCRA | CPU_FTR_SMT | - CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | - CPU_FTR_MMCRA_SIHV, - COMMON_USER_PPC64, - 128, 128, - __setup_cpu_power4, - COMMON_PPC64_FW - }, - { /* BE DD1.x */ - 0xffff0000, 0x00700000, "Broadband Engine", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_ALTIVEC_COMP | - CPU_FTR_SMT, - COMMON_USER_PPC64 | PPC_FEATURE_HAS_ALTIVEC_COMP, - 128, 128, - __setup_cpu_be, - COMMON_PPC64_FW - }, - { /* default match */ - 0x00000000, 0x00000000, "POWER4 (compatible)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2, - COMMON_USER_PPC64, - 128, 128, - __setup_cpu_power4, - COMMON_PPC64_FW - } + { /* Power3 */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x00400000, + .cpu_name = "POWER3 (630)", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | + CPU_FTR_PMC8, + .cpu_user_features = COMMON_USER_PPC64, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_power3, + .firmware_features = COMMON_PPC64_FW, + }, + { /* Power3+ */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x00410000, + .cpu_name = "POWER3 (630+)", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | + CPU_FTR_PMC8, + .cpu_user_features = COMMON_USER_PPC64, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_power3, + .firmware_features = COMMON_PPC64_FW, + }, + { /* Northstar */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x00330000, + .cpu_name = "RS64-II (northstar)", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | + CPU_FTR_PMC8 | CPU_FTR_MMCRA, + .cpu_user_features = COMMON_USER_PPC64, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_power3, + .firmware_features = COMMON_PPC64_FW, + }, + { /* Pulsar */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x00340000, + .cpu_name = "RS64-III (pulsar)", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | + CPU_FTR_PMC8 | CPU_FTR_MMCRA, + .cpu_user_features = COMMON_USER_PPC64, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_power3, + .firmware_features = COMMON_PPC64_FW, + }, + { /* I-star */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x00360000, + .cpu_name = "RS64-III (icestar)", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | + CPU_FTR_PMC8 | CPU_FTR_MMCRA, + .cpu_user_features = COMMON_USER_PPC64, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_power3, + .firmware_features = COMMON_PPC64_FW, + }, + { /* S-star */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x00370000, + .cpu_name = "RS64-IV (sstar)", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | + CPU_FTR_PMC8 | CPU_FTR_MMCRA, + .cpu_user_features = COMMON_USER_PPC64, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_power3, + .firmware_features = COMMON_PPC64_FW, + }, + { /* Power4 */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x00350000, + .cpu_name = "POWER4 (gp)", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | + CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_PMC8 | CPU_FTR_MMCRA, + .cpu_user_features = COMMON_USER_PPC64, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_power4, + .firmware_features = COMMON_PPC64_FW, + }, + { /* Power4+ */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x00380000, + .cpu_name = "POWER4+ (gq)", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | + CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_PMC8 | CPU_FTR_MMCRA, + .cpu_user_features = COMMON_USER_PPC64, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_power4, + .firmware_features = COMMON_PPC64_FW, + }, + { /* PPC970 */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x00390000, + .cpu_name = "PPC970", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | + CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_ALTIVEC_COMP | + CPU_FTR_CAN_NAP | CPU_FTR_PMC8 | CPU_FTR_MMCRA, + .cpu_user_features = COMMON_USER_PPC64 | + PPC_FEATURE_HAS_ALTIVEC_COMP, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_ppc970, + .firmware_features = COMMON_PPC64_FW, + }, + { /* PPC970FX */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x003c0000, + .cpu_name = "PPC970FX", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | + CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_ALTIVEC_COMP | + CPU_FTR_CAN_NAP | CPU_FTR_PMC8 | CPU_FTR_MMCRA, + .cpu_user_features = COMMON_USER_PPC64 | + PPC_FEATURE_HAS_ALTIVEC_COMP, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_ppc970, + .firmware_features = COMMON_PPC64_FW, + }, + { /* Power5 */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x003a0000, + .cpu_name = "POWER5 (gr)", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | + CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_MMCRA | CPU_FTR_SMT | + CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | + CPU_FTR_MMCRA_SIHV, + .cpu_user_features = COMMON_USER_PPC64, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_power4, + .firmware_features = COMMON_PPC64_FW, + }, + { /* Power5 */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x003b0000, + .cpu_name = "POWER5 (gs)", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | + CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_MMCRA | CPU_FTR_SMT | + CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | + CPU_FTR_MMCRA_SIHV, + .cpu_user_features = COMMON_USER_PPC64, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_power4, + .firmware_features = COMMON_PPC64_FW, + }, + { /* BE DD1.x */ + .pvr_mask = 0xffff0000, + .pvr_value = 0x00700000, + .cpu_name = "Broadband Engine", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | + CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_ALTIVEC_COMP | + CPU_FTR_SMT, + .cpu_user_features = COMMON_USER_PPC64 | + PPC_FEATURE_HAS_ALTIVEC_COMP, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_be, + .firmware_features = COMMON_PPC64_FW, + }, + { /* default match */ + .pvr_mask = 0x00000000, + .pvr_value = 0x00000000, + .cpu_name = "POWER4 (compatible)", + .cpu_features = CPU_FTR_SPLIT_ID_CACHE | + CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | + CPU_FTR_PPCAS_ARCH_V2, + .cpu_user_features = COMMON_USER_PPC64, + .icache_bsize = 128, + .dcache_bsize = 128, + .cpu_setup = __setup_cpu_power4, + .firmware_features = COMMON_PPC64_FW, + } }; firmware_feature_t firmware_features_table[FIRMWARE_MAX_FEATURES] = { - {FW_FEATURE_PFT, "hcall-pft"}, - {FW_FEATURE_TCE, "hcall-tce"}, - {FW_FEATURE_SPRG0, "hcall-sprg0"}, - {FW_FEATURE_DABR, "hcall-dabr"}, - {FW_FEATURE_COPY, "hcall-copy"}, - {FW_FEATURE_ASR, "hcall-asr"}, - {FW_FEATURE_DEBUG, "hcall-debug"}, - {FW_FEATURE_PERF, "hcall-perf"}, - {FW_FEATURE_DUMP, "hcall-dump"}, - {FW_FEATURE_INTERRUPT, "hcall-interrupt"}, - {FW_FEATURE_MIGRATE, "hcall-migrate"}, - {FW_FEATURE_PERFMON, "hcall-perfmon"}, - {FW_FEATURE_CRQ, "hcall-crq"}, - {FW_FEATURE_VIO, "hcall-vio"}, - {FW_FEATURE_RDMA, "hcall-rdma"}, - {FW_FEATURE_LLAN, "hcall-lLAN"}, - {FW_FEATURE_BULK, "hcall-bulk"}, - {FW_FEATURE_XDABR, "hcall-xdabr"}, - {FW_FEATURE_MULTITCE, "hcall-multi-tce"}, - {FW_FEATURE_SPLPAR, "hcall-splpar"}, + {FW_FEATURE_PFT, "hcall-pft"}, + {FW_FEATURE_TCE, "hcall-tce"}, + {FW_FEATURE_SPRG0, "hcall-sprg0"}, + {FW_FEATURE_DABR, "hcall-dabr"}, + {FW_FEATURE_COPY, "hcall-copy"}, + {FW_FEATURE_ASR, "hcall-asr"}, + {FW_FEATURE_DEBUG, "hcall-debug"}, + {FW_FEATURE_PERF, "hcall-perf"}, + {FW_FEATURE_DUMP, "hcall-dump"}, + {FW_FEATURE_INTERRUPT, "hcall-interrupt"}, + {FW_FEATURE_MIGRATE, "hcall-migrate"}, + {FW_FEATURE_PERFMON, "hcall-perfmon"}, + {FW_FEATURE_CRQ, "hcall-crq"}, + {FW_FEATURE_VIO, "hcall-vio"}, + {FW_FEATURE_RDMA, "hcall-rdma"}, + {FW_FEATURE_LLAN, "hcall-lLAN"}, + {FW_FEATURE_BULK, "hcall-bulk"}, + {FW_FEATURE_XDABR, "hcall-xdabr"}, + {FW_FEATURE_MULTITCE, "hcall-multi-tce"}, + {FW_FEATURE_SPLPAR, "hcall-splpar"}, }; From anton at samba.org Wed Jul 6 04:36:53 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 6 Jul 2005 04:36:53 +1000 Subject: [PATCH] ppc64: Fix runlatch code to work on pseries machines In-Reply-To: <20050705162340.GH5384@krispykreme> References: <20050705162340.GH5384@krispykreme> Message-ID: <20050705183653.GI5384@krispykreme> Not all ppc64 CPUs have the CTRL SPR, so we need a cputable feature for it. Signed-off-by: Anton Blanchard Index: linux-2.6.git-work/include/asm-ppc64/processor.h =================================================================== --- linux-2.6.git-work.orig/include/asm-ppc64/processor.h 2005-07-02 08:20:46.000000000 +1000 +++ linux-2.6.git-work/include/asm-ppc64/processor.h 2005-07-06 01:20:04.000000000 +1000 @@ -20,6 +20,7 @@ #include #include #include +#include /* Machine State Register (MSR) Fields */ #define MSR_SF_LG 63 /* Enable 64 bit mode */ @@ -501,18 +502,22 @@ { unsigned long ctrl; - ctrl = mfspr(SPRN_CTRLF); - ctrl |= CTRL_RUNLATCH; - mtspr(SPRN_CTRLT, ctrl); + if (cpu_has_feature(CPU_FTR_CTRL)) { + ctrl = mfspr(SPRN_CTRLF); + ctrl |= CTRL_RUNLATCH; + mtspr(SPRN_CTRLT, ctrl); + } } static inline void ppc64_runlatch_off(void) { unsigned long ctrl; - ctrl = mfspr(SPRN_CTRLF); - ctrl &= ~CTRL_RUNLATCH; - mtspr(SPRN_CTRLT, ctrl); + if (cpu_has_feature(CPU_FTR_CTRL)) { + ctrl = mfspr(SPRN_CTRLF); + ctrl &= ~CTRL_RUNLATCH; + mtspr(SPRN_CTRLT, ctrl); + } } #endif /* __KERNEL__ */ Index: linux-2.6.git-work/include/asm-ppc64/cputable.h =================================================================== --- linux-2.6.git-work.orig/include/asm-ppc64/cputable.h 2005-07-02 08:20:45.000000000 +1000 +++ linux-2.6.git-work/include/asm-ppc64/cputable.h 2005-07-06 01:20:04.000000000 +1000 @@ -138,6 +138,7 @@ #define CPU_FTR_COHERENT_ICACHE ASM_CONST(0x0000020000000000) #define CPU_FTR_LOCKLESS_TLBIE ASM_CONST(0x0000040000000000) #define CPU_FTR_MMCRA_SIHV ASM_CONST(0x0000080000000000) +#define CPU_FTR_CTRL ASM_CONST(0x0000100000000000) /* Platform firmware features */ #define FW_FTR_ ASM_CONST(0x0000000000000001) @@ -148,7 +149,7 @@ #define CPU_FTR_PPCAS_ARCH_V2_BASE (CPU_FTR_SLB | \ CPU_FTR_TLBIEL | CPU_FTR_NOEXECUTE | \ - CPU_FTR_NODSISRALIGN) + CPU_FTR_NODSISRALIGN | CPU_FTR_CTRL) /* iSeries doesn't support large pages */ #ifdef CONFIG_PPC_ISERIES Index: linux-2.6.git-work/arch/ppc64/kernel/cputable.c =================================================================== --- linux-2.6.git-work.orig/arch/ppc64/kernel/cputable.c 2005-07-03 11:15:43.000000000 +1000 +++ linux-2.6.git-work/arch/ppc64/kernel/cputable.c 2005-07-06 01:21:21.000000000 +1000 @@ -81,7 +81,7 @@ .cpu_name = "RS64-II (northstar)", .cpu_features = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | - CPU_FTR_PMC8 | CPU_FTR_MMCRA, + CPU_FTR_PMC8 | CPU_FTR_MMCRA | CPU_FTR_CTRL, .cpu_user_features = COMMON_USER_PPC64, .icache_bsize = 128, .dcache_bsize = 128, @@ -94,7 +94,7 @@ .cpu_name = "RS64-III (pulsar)", .cpu_features = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | - CPU_FTR_PMC8 | CPU_FTR_MMCRA, + CPU_FTR_PMC8 | CPU_FTR_MMCRA | CPU_FTR_CTRL, .cpu_user_features = COMMON_USER_PPC64, .icache_bsize = 128, .dcache_bsize = 128, @@ -107,7 +107,7 @@ .cpu_name = "RS64-III (icestar)", .cpu_features = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | - CPU_FTR_PMC8 | CPU_FTR_MMCRA, + CPU_FTR_PMC8 | CPU_FTR_MMCRA | CPU_FTR_CTRL, .cpu_user_features = COMMON_USER_PPC64, .icache_bsize = 128, .dcache_bsize = 128, @@ -120,7 +120,7 @@ .cpu_name = "RS64-IV (sstar)", .cpu_features = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | - CPU_FTR_PMC8 | CPU_FTR_MMCRA, + CPU_FTR_PMC8 | CPU_FTR_MMCRA | CPU_FTR_CTRL, .cpu_user_features = COMMON_USER_PPC64, .icache_bsize = 128, .dcache_bsize = 128, From sonny at burdell.org Wed Jul 6 05:48:04 2005 From: sonny at burdell.org (Sonny Rao) Date: Tue, 5 Jul 2005 15:48:04 -0400 Subject: [PATCH] ppc64: Fix runlatch code to work on pseries machines In-Reply-To: <20050705183653.GI5384@krispykreme> References: <20050705162340.GH5384@krispykreme> <20050705183653.GI5384@krispykreme> Message-ID: <20050705194804.GA17587@kevlar.burdell.org> On Wed, Jul 06, 2005 at 04:36:53AM +1000, Anton Blanchard wrote: > > Not all ppc64 CPUs have the CTRL SPR, so we need a cputable feature for it. > > Signed-off-by: Anton Blanchard Forgive my ignorance, but why don't POWER4 and above have this feature, is is related to runlatch? Thanks Sonny From anton at samba.org Wed Jul 6 06:22:43 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 6 Jul 2005 06:22:43 +1000 Subject: [PATCH] ppc64: Fix runlatch code to work on pseries machines In-Reply-To: <20050705194804.GA17587@kevlar.burdell.org> References: <20050705162340.GH5384@krispykreme> <20050705183653.GI5384@krispykreme> <20050705194804.GA17587@kevlar.burdell.org> Message-ID: <20050705202243.GB12786@krispykreme> Hi, > Forgive my ignorance, but why don't POWER4 and above have this > feature, is is related to runlatch? Its a feature of PPC AS v2, so I added the define there: @@ -148,7 +149,7 @@ #define CPU_FTR_PPCAS_ARCH_V2_BASE (CPU_FTR_SLB | \ CPU_FTR_TLBIEL | CPU_FTR_NOEXECUTE | \ - CPU_FTR_NODSISRALIGN) + CPU_FTR_NODSISRALIGN | CPU_FTR_CTRL) Anton From anton at samba.org Wed Jul 6 06:21:46 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 6 Jul 2005 06:21:46 +1000 Subject: Make idle_loop a member of ppc_md In-Reply-To: <200507012146.19553.michael@ellerman.id.au> References: <200507012146.19553.michael@ellerman.id.au> Message-ID: <20050705202146.GA12786@krispykreme> Hi Michael, > Currently the idle loop is selected in idle_setup() by consulting > systemcfg->platform and with a few ifdefs as well. > > These five patches make idle_loop a member of the ppc_md structure, and moves > the selection into the respective platforms' setup_arch(). > > I wrote this and then change my mind, and thought we should instead try and > reduce the number of different idle loops. But that looks hard, perhaps > impossible, so this might be as good as it gets. Looks good to me. Ive been meaning to fix up our runlatch handling in the idle loops, so here are a few more patches on top of your series. The previous two patches I sent out need to be applied also: [PATCH] ppc64: use c99 initialisers in cputable code [PATCH] ppc64: Fix runlatch code to work on pseries machines Anton From anton at samba.org Wed Jul 6 06:37:21 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 6 Jul 2005 06:37:21 +1000 Subject: Make idle_loop a member of ppc_md In-Reply-To: <20050705202146.GA12786@krispykreme> References: <200507012146.19553.michael@ellerman.id.au> <20050705202146.GA12786@krispykreme> Message-ID: <20050705203721.GC12786@krispykreme> iSeries idle fixups: - remove min/max yield time, we dont use the values anywhere - separate shared and dedicated idle loops - check need_resched again with irqs off to avoid sleeping with pending work - continually set runlatch off in idle loop, this means we dont need to turn the runlatch off on exception exit and suffer that associated cost for all exceptions. (A future patch will turn the runlatch on at exception entry) Signed-off-by: Anton Blanchard Index: foobar2/arch/ppc64/kernel/iSeries_setup.c =================================================================== --- foobar2.orig/arch/ppc64/kernel/iSeries_setup.c 2005-07-06 02:26:24.061621784 +1000 +++ foobar2/arch/ppc64/kernel/iSeries_setup.c 2005-07-06 05:49:46.629711734 +1000 @@ -834,9 +834,6 @@ late_initcall(iSeries_src_init); -static unsigned long maxYieldTime = 0; -static unsigned long minYieldTime = 0xffffffffffffffffUL; - static inline void process_iSeries_events(void) { asm volatile ("li 0,0x5555; sc" : : : "r0", "r3"); @@ -845,7 +842,6 @@ static void yield_shared_processor(void) { unsigned long tb; - unsigned long yieldTime; HvCall_setEnabledInterrupts(HvCall_MaskIPI | HvCall_MaskLpEvent | @@ -856,13 +852,6 @@ /* Compute future tb value when yield should expire */ HvCall_yieldProcessor(HvCall_YieldTimed, tb+tb_ticks_per_jiffy); - yieldTime = get_tb() - tb; - if (yieldTime > maxYieldTime) - maxYieldTime = yieldTime; - - if (yieldTime < minYieldTime) - minYieldTime = yieldTime; - /* * The decrementer stops during the yield. Force a fake decrementer * here and let the timer_interrupt code sort out the actual time. @@ -871,45 +860,62 @@ process_iSeries_events(); } -static int iSeries_idle(void) +static int iseries_shared_idle(void) { - struct paca_struct *lpaca; - long oldval; + while (1) { + while (!need_resched() && !hvlpevent_is_pending()) { + local_irq_disable(); + ppc64_runlatch_off(); + + /* Recheck with irqs off */ + if (!need_resched() && !hvlpevent_is_pending()) + yield_shared_processor(); + + HMT_medium(); + local_irq_enable(); + } + + ppc64_runlatch_on(); + + if (hvlpevent_is_pending()) + process_iSeries_events(); + + schedule(); + } - /* ensure iSeries run light will be out when idle */ - ppc64_runlatch_off(); + return 0; +} - lpaca = get_paca(); +static int iseries_dedicated_idle(void) +{ + struct paca_struct *lpaca = get_paca(); + long oldval; while (1) { - if (lpaca->lppaca.shared_proc) { - if (hvlpevent_is_pending()) - process_iSeries_events(); - if (!need_resched()) - yield_shared_processor(); - } else { - oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); + oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); + + if (!oldval) { + set_thread_flag(TIF_POLLING_NRFLAG); - if (!oldval) { - set_thread_flag(TIF_POLLING_NRFLAG); + while (!need_resched()) { + ppc64_runlatch_off(); + HMT_low(); - while (!need_resched()) { + if (hvlpevent_is_pending()) { HMT_medium(); - if (hvlpevent_is_pending()) - process_iSeries_events(); - HMT_low(); + ppc64_runlatch_on(); + process_iSeries_events(); } - - HMT_medium(); - clear_thread_flag(TIF_POLLING_NRFLAG); - } else { - set_need_resched(); } + + HMT_medium(); + clear_thread_flag(TIF_POLLING_NRFLAG); + } else { + set_need_resched(); } ppc64_runlatch_on(); schedule(); - ppc64_runlatch_off(); } return 0; @@ -940,6 +946,10 @@ ppc_md.get_rtc_time = iSeries_get_rtc_time; ppc_md.calibrate_decr = iSeries_calibrate_decr; ppc_md.progress = iSeries_progress; - ppc_md.idle_loop = iSeries_idle; + + if (get_paca()->lppaca.shared_proc) + ppc_md.idle_loop = iseries_shared_idle; + else + ppc_md.idle_loop = iseries_dedicated_idle; } From anton at samba.org Wed Jul 6 06:43:03 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 6 Jul 2005 06:43:03 +1000 Subject: [PATCH] ppc64: pSeries idle fixups In-Reply-To: <20050705203721.GC12786@krispykreme> References: <200507012146.19553.michael@ellerman.id.au> <20050705202146.GA12786@krispykreme> <20050705203721.GC12786@krispykreme> Message-ID: <20050705204303.GD12786@krispykreme> pSeries idle fixups: - separate out sleep logic in dedicated_idle, it was so far indented that it got squashed against the right side of the screen. - add runlatch support, looping on runlatch disable. Signed-off-by: Anton Blanchard Index: foobar2/arch/ppc64/kernel/pSeries_setup.c =================================================================== --- foobar2.orig/arch/ppc64/kernel/pSeries_setup.c 2005-07-06 05:49:51.479133649 +1000 +++ foobar2/arch/ppc64/kernel/pSeries_setup.c 2005-07-06 06:14:22.752007077 +1000 @@ -83,8 +83,8 @@ extern void pSeries_system_reset_exception(struct pt_regs *regs); extern int pSeries_machine_check_exception(struct pt_regs *regs); -static int shared_idle(void); -static int dedicated_idle(void); +static int pseries_shared_idle(void); +static int pseries_dedicated_idle(void); static volatile void __iomem * chrp_int_ack_special; struct mpic *pSeries_mpic; @@ -238,10 +238,10 @@ if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { if (get_paca()->lppaca.shared_proc) { printk(KERN_INFO "Using shared processor idle loop\n"); - ppc_md.idle_loop = shared_idle; + ppc_md.idle_loop = pseries_shared_idle; } else { printk(KERN_INFO "Using dedicated idle loop\n"); - ppc_md.idle_loop = dedicated_idle; + ppc_md.idle_loop = pseries_dedicated_idle; } } else { printk(KERN_INFO "Using default idle loop\n"); @@ -438,15 +438,47 @@ DECLARE_PER_CPU(unsigned long, smt_snooze_delay); -int dedicated_idle(void) +static inline void dedicated_idle_sleep(unsigned int cpu) +{ + struct paca_struct *ppaca = &paca[cpu ^ 1]; + + /* Only sleep if the other thread is not idle */ + if (!(ppaca->lppaca.idle)) { + local_irq_disable(); + + /* + * We are about to sleep the thread and so wont be polling any + * more. + */ + clear_thread_flag(TIF_POLLING_NRFLAG); + + /* + * SMT dynamic mode. Cede will result in this thread going + * dormant, if the partner thread is still doing work. Thread + * wakes up if partner goes idle, an interrupt is presented, or + * a prod occurs. Returning from the cede enables external + * interrupts. + */ + if (!need_resched()) + cede_processor(); + else + local_irq_enable(); + } else { + /* + * Give the HV an opportunity at the processor, since we are + * not doing any work. + */ + poll_pending(); + } +} + +static int pseries_dedicated_idle(void) { long oldval; - struct paca_struct *lpaca = get_paca(), *ppaca; + struct paca_struct *lpaca = get_paca(); + unsigned int cpu = smp_processor_id(); unsigned long start_snooze; unsigned long *smt_snooze_delay = &__get_cpu_var(smt_snooze_delay); - unsigned int cpu = smp_processor_id(); - - ppaca = &paca[cpu ^ 1]; while (1) { /* @@ -458,9 +490,13 @@ oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); if (!oldval) { set_thread_flag(TIF_POLLING_NRFLAG); + start_snooze = __get_tb() + *smt_snooze_delay * tb_ticks_per_usec; + while (!need_resched() && !cpu_is_offline(cpu)) { + ppc64_runlatch_off(); + /* * Go into low thread priority and possibly * low power mode. @@ -468,60 +504,31 @@ HMT_low(); HMT_very_low(); - if (*smt_snooze_delay == 0 || - __get_tb() < start_snooze) - continue; - - HMT_medium(); - - if (!(ppaca->lppaca.idle)) { - local_irq_disable(); - - /* - * We are about to sleep the thread - * and so wont be polling any - * more. - */ - clear_thread_flag(TIF_POLLING_NRFLAG); - - /* - * SMT dynamic mode. Cede will result - * in this thread going dormant, if the - * partner thread is still doing work. - * Thread wakes up if partner goes idle, - * an interrupt is presented, or a prod - * occurs. Returning from the cede - * enables external interrupts. - */ - if (!need_resched()) - cede_processor(); - else - local_irq_enable(); - } else { - /* - * Give the HV an opportunity at the - * processor, since we are not doing - * any work. - */ - poll_pending(); + if (*smt_snooze_delay != 0 && + __get_tb() > start_snooze) { + HMT_medium(); + dedicated_idle_sleep(cpu); } + } + HMT_medium(); clear_thread_flag(TIF_POLLING_NRFLAG); } else { set_need_resched(); } - HMT_medium(); lpaca->lppaca.idle = 0; + ppc64_runlatch_on(); + schedule(); + if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING) cpu_die(); } - return 0; } -static int shared_idle(void) +static int pseries_shared_idle(void) { struct paca_struct *lpaca = get_paca(); unsigned int cpu = smp_processor_id(); @@ -535,6 +542,7 @@ while (!need_resched() && !cpu_is_offline(cpu)) { local_irq_disable(); + ppc64_runlatch_off(); /* * Yield the processor to the hypervisor. We return if @@ -550,13 +558,16 @@ cede_processor(); else local_irq_enable(); + + HMT_medium(); } - HMT_medium(); lpaca->lppaca.idle = 0; + ppc64_runlatch_on(); + schedule(); - if (cpu_is_offline(smp_processor_id()) && - system_state == SYSTEM_RUNNING) + + if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING) cpu_die(); } From olh at suse.de Wed Jul 6 06:47:36 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 5 Jul 2005 22:47:36 +0200 Subject: [PATCH] allow xmon=nobt to not print a backtrace by default In-Reply-To: <20050531202931.GA14769@suse.de> References: <20050531202931.GA14769@suse.de> Message-ID: <20050705204736.GA31800@suse.de> (untested) xmon does not print a backtrace per default. This is bad on systems with USB keyboard, the most needed info about the crash is lost. print a backtrace during the very first xmon entry. Booting with xmon=nobt disables the autobacktrace functionality. Signed-off-by: Olaf Hering arch/ppc64/kernel/setup.c | 4 ++++ arch/ppc64/xmon/xmon.c | 5 +++++ 2 files changed, 9 insertions(+) Index: linux-2.6.12-olh/arch/ppc64/kernel/setup.c =================================================================== --- linux-2.6.12-olh.orig/arch/ppc64/kernel/setup.c +++ linux-2.6.12-olh/arch/ppc64/kernel/setup.c @@ -91,6 +91,8 @@ extern void udbg_init_maple_realmode(voi do { ppc_md.udbg_putc = call_rtas_display_status_delay; } while(0) #endif +extern int xmon_no_auto_backtrace; + /* extern void *stab; */ extern unsigned long klimit; @@ -1318,6 +1320,8 @@ static int __init early_xmon(char *p) { /* ensure xmon is enabled */ if (p) { + if (strncmp(p, "nobt", 4) == 0) + xmon_no_auto_backtrace++; if (strncmp(p, "on", 2) == 0) xmon_init(); if (strncmp(p, "early", 5) != 0) Index: linux-2.6.12-olh/arch/ppc64/xmon/xmon.c =================================================================== --- linux-2.6.12-olh.orig/arch/ppc64/xmon/xmon.c +++ linux-2.6.12-olh/arch/ppc64/xmon/xmon.c @@ -132,11 +132,13 @@ static void csum(void); static void bootcmds(void); void dump_segments(void); static void symbol_lookup(void); +static void xmon_show_stack(unsigned long sp, unsigned long lr, unsigned long pc); static void xmon_print_symbol(unsigned long address, const char *mid, const char *after); static const char *getvecname(unsigned long vec); static void debug_trace(void); +int xmon_no_auto_backtrace; extern int print_insn_powerpc(unsigned long, unsigned long, int); extern void printf(const char *fmt, ...); @@ -768,6 +770,9 @@ cmds(struct pt_regs *excp) last_cmd = NULL; xmon_regs = excp; + if (!xmon_no_auto_backtrace++) + xmon_show_stack(excp->gpr[1], excp->link, excp->nip); + for(;;) { #ifdef CONFIG_SMP printf("%x:", smp_processor_id()); From anton at samba.org Wed Jul 6 06:46:15 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 6 Jul 2005 06:46:15 +1000 Subject: [PATCH] ppc64: idle fixups In-Reply-To: <20050705204303.GD12786@krispykreme> References: <200507012146.19553.michael@ellerman.id.au> <20050705202146.GA12786@krispykreme> <20050705203721.GC12786@krispykreme> <20050705204303.GD12786@krispykreme> Message-ID: <20050705204615.GE12786@krispykreme> - remove some unnecessary includes - add runlatch support - no need to use raw_smp_processor_id any more, current preempt debug logic checks for processes that are bound to one cpu. Signed-off-by: Anton Blanchard Index: linux-2.6.git-work/arch/ppc64/kernel/idle.c =================================================================== --- linux-2.6.git-work.orig/arch/ppc64/kernel/idle.c 2005-07-02 08:24:55.000000000 +1000 +++ linux-2.6.git-work/arch/ppc64/kernel/idle.c 2005-07-06 01:50:08.000000000 +1000 @@ -20,18 +20,12 @@ #include #include #include -#include #include -#include #include #include -#include #include #include -#include -#include -#include #include #include @@ -49,7 +43,8 @@ set_thread_flag(TIF_POLLING_NRFLAG); while (!need_resched() && !cpu_is_offline(cpu)) { - barrier(); + ppc64_runlatch_off(); + /* * Go into low thread priority and possibly * low power mode. @@ -64,6 +59,7 @@ set_need_resched(); } + ppc64_runlatch_on(); schedule(); if (cpu_is_offline(cpu) && system_state == SYSTEM_RUNNING) cpu_die(); @@ -74,17 +70,22 @@ int native_idle(void) { - while(1) { - /* check CPU type here */ + while (1) { + ppc64_runlatch_off(); + if (!need_resched()) power4_idle(); - if (need_resched()) + + if (need_resched()) { + ppc64_runlatch_on(); schedule(); + } - if (cpu_is_offline(raw_smp_processor_id()) && + if (cpu_is_offline(smp_processor_id()) && system_state == SYSTEM_RUNNING) cpu_die(); } + return 0; } From anton at samba.org Wed Jul 6 08:49:51 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 6 Jul 2005 08:49:51 +1000 Subject: [PATCH] ppc64: fix compile warning In-Reply-To: <20050705204615.GE12786@krispykreme> References: <200507012146.19553.michael@ellerman.id.au> <20050705202146.GA12786@krispykreme> <20050705203721.GC12786@krispykreme> <20050705204303.GD12786@krispykreme> <20050705204615.GE12786@krispykreme> Message-ID: <20050705224951.GK12786@krispykreme> Fix a compile warning introduced by the previous patches. Signed-off-by: Anton Blanchard Index: foobar2/arch/ppc64/kernel/iSeries_setup.c =================================================================== --- foobar2.orig/arch/ppc64/kernel/iSeries_setup.c 2005-07-06 07:09:32.039334942 +1000 +++ foobar2/arch/ppc64/kernel/iSeries_setup.c 2005-07-06 07:14:29.159334906 +1000 @@ -888,7 +888,6 @@ static int iseries_dedicated_idle(void) { - struct paca_struct *lpaca = get_paca(); long oldval; while (1) { From anton at samba.org Wed Jul 6 09:12:43 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 6 Jul 2005 09:12:43 +1000 Subject: [PATCH] ppc64: Turn runlatch on in exception entry In-Reply-To: <20050705183653.GI5384@krispykreme> References: <20050705162340.GH5384@krispykreme> <20050705183653.GI5384@krispykreme> Message-ID: <20050705231243.GM12786@krispykreme> Enable the runlatch at the start of each exception. Unfortunately we are out of space in the 0x300 handler, so I added it a bit later. The SPR write is fairly expensive, perhaps we should cache the runlatch state in the paca and avoid the write when possible. We dont need to turn the runlatch off, we do that in the idle loop. Better to take the hit in the idle loop than for each exception exit. Signed-off-by: Anton Blanchard Index: foobar2/arch/ppc64/kernel/head.S =================================================================== --- foobar2.orig/arch/ppc64/kernel/head.S 2005-07-06 07:28:11.576663962 +1000 +++ foobar2/arch/ppc64/kernel/head.S 2005-07-06 07:35:41.567944291 +1000 @@ -308,6 +308,7 @@ label##_pSeries: \ HMT_MEDIUM; \ mtspr SPRG1,r13; /* save r13 */ \ + RUNLATCH_ON(r13); \ EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common) #define STD_EXCEPTION_ISERIES(n, label, area) \ @@ -315,6 +316,7 @@ label##_iSeries: \ HMT_MEDIUM; \ mtspr SPRG1,r13; /* save r13 */ \ + RUNLATCH_ON(r13); \ EXCEPTION_PROLOG_ISERIES_1(area); \ EXCEPTION_PROLOG_ISERIES_2; \ b label##_common @@ -324,6 +326,7 @@ label##_iSeries: \ HMT_MEDIUM; \ mtspr SPRG1,r13; /* save r13 */ \ + RUNLATCH_ON(r13); \ EXCEPTION_PROLOG_ISERIES_1(PACA_EXGEN); \ lbz r10,PACAPROCENABLED(r13); \ cmpwi 0,r10,0; \ @@ -393,6 +396,7 @@ _machine_check_pSeries: HMT_MEDIUM mtspr SPRG1,r13 /* save r13 */ + RUNLATCH_ON(r13) EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common) . = 0x300 @@ -419,6 +423,7 @@ data_access_slb_pSeries: HMT_MEDIUM mtspr SPRG1,r13 + RUNLATCH_ON(r13) mfspr r13,SPRG3 /* get paca address into r13 */ std r9,PACA_EXSLB+EX_R9(r13) /* save r9 - r12 */ std r10,PACA_EXSLB+EX_R10(r13) @@ -439,6 +444,7 @@ instruction_access_slb_pSeries: HMT_MEDIUM mtspr SPRG1,r13 + RUNLATCH_ON(r13) mfspr r13,SPRG3 /* get paca address into r13 */ std r9,PACA_EXSLB+EX_R9(r13) /* save r9 - r12 */ std r10,PACA_EXSLB+EX_R10(r13) @@ -464,6 +470,7 @@ .globl system_call_pSeries system_call_pSeries: HMT_MEDIUM + RUNLATCH_ON(r9) mr r9,r13 mfmsr r10 mfspr r13,SPRG3 @@ -707,11 +714,13 @@ system_reset_fwnmi: HMT_MEDIUM mtspr SPRG1,r13 /* save r13 */ + RUNLATCH_ON(r13) EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common) .globl machine_check_fwnmi machine_check_fwnmi: HMT_MEDIUM mtspr SPRG1,r13 /* save r13 */ + RUNLATCH_ON(r13) EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common) /* @@ -848,6 +857,7 @@ .align 7 .globl data_access_common data_access_common: + RUNLATCH_ON(r10) /* It wont fit in the 0x300 handler */ mfspr r10,DAR std r10,PACA_EXGEN+EX_DAR(r13) mfspr r10,DSISR Index: foobar2/include/asm-ppc64/processor.h =================================================================== --- foobar2.orig/include/asm-ppc64/processor.h 2005-07-06 07:28:11.577663885 +1000 +++ foobar2/include/asm-ppc64/processor.h 2005-07-06 07:30:34.878399039 +1000 @@ -524,6 +524,15 @@ #endif /* __ASSEMBLY__ */ +#ifdef __KERNEL__ +#define RUNLATCH_ON(REG) \ +BEGIN_FTR_SECTION \ + mfspr (REG),SPRN_CTRLF; \ + ori (REG),(REG),CTRL_RUNLATCH; \ + mtspr SPRN_CTRLT,(REG); \ +END_FTR_SECTION_IFSET(CPU_FTR_CTRL) +#endif + /* * Number of entries in the SLB. If this ever changes we should handle * it with a use a cpu feature fixup. From benh at kernel.crashing.org Wed Jul 6 09:54:15 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 06 Jul 2005 09:54:15 +1000 Subject: [PATCH] vdso32, fix link errors after recent toolchain changes In-Reply-To: <20050704120244.GA10377@suse.de> References: <20050704120244.GA10377@suse.de> Message-ID: <1120607655.31924.175.camel@gaston> On Mon, 2005-07-04 at 14:02 +0200, Olaf Hering wrote: > Patch from amodra at bigpond.net.au, http://sources.redhat.com/bugzilla/show_bug.cgi?id=1042 > > /usr/bin/ld: arch/ppc64/kernel/vdso32/vdso32.so: The first section in the PT_DYNAMIC segment is not the .dynamic section > > Signed-off-by: Olaf Hering Acked-by: Benjamin Herrenschmidt From michael at ellerman.id.au Wed Jul 6 12:41:13 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Wed, 6 Jul 2005 12:41:13 +1000 Subject: [PATCH] ppc64: Be consistent about printing which idle loop we're using In-Reply-To: <20050705224951.GK12786@krispykreme> References: <200507012146.19553.michael@ellerman.id.au> <20050705204615.GE12786@krispykreme> <20050705224951.GK12786@krispykreme> Message-ID: <200507061241.30135.michael@ellerman.id.au> Not sure if we really need this, but it was handy to know which iSeries loop I was testing. Be consistent about printing which idle loop we're using, with this patch we cover all cases. Signed-off-by: Michael Ellerman --- arch/ppc64/kernel/iSeries_setup.c | 7 +++++-- arch/ppc64/kernel/setup.c | 4 +++- 2 files changed, 8 insertions(+), 3 deletions(-) Index: work/arch/ppc64/kernel/iSeries_setup.c =================================================================== --- work.orig/arch/ppc64/kernel/iSeries_setup.c +++ work/arch/ppc64/kernel/iSeries_setup.c @@ -946,9 +946,12 @@ void __init iSeries_early_setup(void) ppc_md.calibrate_decr = iSeries_calibrate_decr; ppc_md.progress = iSeries_progress; - if (get_paca()->lppaca.shared_proc) + if (get_paca()->lppaca.shared_proc) { ppc_md.idle_loop = iseries_shared_idle; - else + printk(KERN_INFO "Using shared processor idle loop\n"); + } else { ppc_md.idle_loop = iseries_dedicated_idle; + printk(KERN_INFO "Using dedicated idle loop\n"); + } } Index: work/arch/ppc64/kernel/setup.c =================================================================== --- work.orig/arch/ppc64/kernel/setup.c +++ work/arch/ppc64/kernel/setup.c @@ -1081,8 +1081,10 @@ void __init setup_arch(char **cmdline_p) ppc_md.setup_arch(); /* Use the default idle loop if the platform hasn't provided one. */ - if (NULL == ppc_md.idle_loop) + if (NULL == ppc_md.idle_loop) { ppc_md.idle_loop = default_idle; + printk(KERN_INFO "Using default idle loop\n"); + } paging_init(); ppc64_boot_msg(0x15, "Setup Done"); From seto.hidetoshi at jp.fujitsu.com Wed Jul 6 14:53:06 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Wed, 06 Jul 2005 13:53:06 +0900 Subject: [PATCH 2.6.13-rc1 01/10] IOCHK interface for I/O error handling/detecting Message-ID: <42CB63B2.6000505@jp.fujitsu.com> Hi all, The followings are updated version of patches I've posted to implement IOCHK interface for I/O error handling/detecting. The abstraction of patches hasn't changed, so please refer archives if you need, e.g.: http://lwn.net/Articles/139240/ Tony, how do you think about applying my patches to your tree? Thanks, H.Seto [This is 1 of 10 patches, "iochk-01-generic.patch"] - It defines: a pair of function : iochk_clear and iochk_read a function for init : iochk_init type of control var : iocookie and describe "no-ops" as its "generic" action. - HAVE_ARCH_IOMAP_CHECK allows us to change whole definition of these functions and type from generic one to specific one. See next patch (2 of 10). Changes from previous one for 2.6.11.11: - reform default "nop" functions in static inline style. - I don't mind using EXPORT_SYMBOL_GPL but keep them as before. Does anyone worry about this? Signed-off-by: Hidetoshi Seto --- drivers/pci/pci.c | 2 ++ include/asm-generic/iomap.h | 32 ++++++++++++++++++++++++++++++++ lib/iomap.c | 6 ++++++ 3 files changed, 40 insertions(+) Index: linux-2.6.13-rc1/lib/iomap.c =================================================================== --- linux-2.6.13-rc1.orig/lib/iomap.c +++ linux-2.6.13-rc1/lib/iomap.c @@ -230,3 +230,9 @@ void pci_iounmap(struct pci_dev *dev, vo } EXPORT_SYMBOL(pci_iomap); EXPORT_SYMBOL(pci_iounmap); + +#ifndef HAVE_ARCH_IOMAP_CHECK +/* Since generic funcs are inlined and defined in header, just export */ +EXPORT_SYMBOL(iochk_clear); +EXPORT_SYMBOL(iochk_read); +#endif Index: linux-2.6.13-rc1/include/asm-generic/iomap.h =================================================================== --- linux-2.6.13-rc1.orig/include/asm-generic/iomap.h +++ linux-2.6.13-rc1/include/asm-generic/iomap.h @@ -65,4 +65,36 @@ struct pci_dev; extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max); extern void pci_iounmap(struct pci_dev *dev, void __iomem *); +/* + * IOMAP_CHECK provides additional interfaces for drivers to detect + * some IO errors, supports drivers having ability to recover errors. + * + * All works around iomap-check depends on the design of "iocookie" + * structure. Every architecture owning its iomap-check is free to + * define the actual design of iocookie to fit its special style. + */ +#ifndef HAVE_ARCH_IOMAP_CHECK +/* Dummy definition of default iocookie */ +typedef int iocookie; +#endif + +/* + * Clear/Read iocookie to check IO error while using iomap. + * + * Note that default iochk_clear-read pair interfaces don't have + * any effective error check, but some high-reliable platforms + * would provide useful information to you. + * And note that some action may be limited (ex. irq-unsafe) + * between the pair depend on the facility of the platform. + */ +#ifdef HAVE_ARCH_IOMAP_CHECK +extern void iochk_init(void); +extern void iochk_clear(iocookie *cookie, struct pci_dev *dev); +extern int iochk_read(iocookie *cookie); +#else +static inline void iochk_init(void) {} +static inline void iochk_clear(iocookie *cookie, struct pci_dev *dev) {} +static inline int iochk_read(iocookie *cookie) { return 0; } +#endif + #endif Index: linux-2.6.13-rc1/drivers/pci/pci.c =================================================================== --- linux-2.6.13-rc1.orig/drivers/pci/pci.c +++ linux-2.6.13-rc1/drivers/pci/pci.c @@ -767,6 +767,8 @@ static int __devinit pci_init(void) while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { pci_fixup_device(pci_fixup_final, dev); } + + iochk_init(); return 0; } From seto.hidetoshi at jp.fujitsu.com Wed Jul 6 15:00:22 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Wed, 06 Jul 2005 14:00:22 +0900 Subject: [PATCH 2.6.13-rc1 02/10] IOCHK interface for I/O error handling/detecting In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com> References: <42CB63B2.6000505@jp.fujitsu.com> Message-ID: <42CB6566.8090804@jp.fujitsu.com> [This is 2 of 10 patches, "iochk-02-ia64.patch"] - Add "config IOMAP_CHECK" to change definitions from generic to specific. - Defines ia64 version of: iochk_clear, iochk_read, iochk_init, and iocookie But they are no-ops yet. See next patch (3 of 10). Changes from previous one for 2.6.11.11: - simplify define of iocookie structure. Signed-off-by: Hidetoshi Seto --- arch/ia64/Kconfig | 13 +++++++++++++ arch/ia64/lib/Makefile | 1 + arch/ia64/lib/iomap_check.c | 30 ++++++++++++++++++++++++++++++ include/asm-ia64/io.h | 13 +++++++++++++ 4 files changed, 57 insertions(+) Index: linux-2.6.13-rc1/arch/ia64/lib/Makefile =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/lib/Makefile +++ linux-2.6.13-rc1/arch/ia64/lib/Makefile @@ -16,6 +16,7 @@ lib-$(CONFIG_MCKINLEY) += copy_page_mck. lib-$(CONFIG_PERFMON) += carta_random.o lib-$(CONFIG_MD_RAID5) += xor.o lib-$(CONFIG_HAVE_DEC_LOCK) += dec_and_lock.o +lib-$(CONFIG_IOMAP_CHECK) += iomap_check.o AFLAGS___divdi3.o = AFLAGS___udivdi3.o = -DUNSIGNED Index: linux-2.6.13-rc1/arch/ia64/Kconfig =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/Kconfig +++ linux-2.6.13-rc1/arch/ia64/Kconfig @@ -413,6 +413,19 @@ config PCI_DOMAINS bool default PCI +config IOMAP_CHECK + bool "Support iochk interfaces for IO error detection." + depends on PCI && EXPERIMENTAL + ---help--- + Saying Y provides iochk infrastructure for "RAS-aware" drivers + to detect and recover some IO errors, which strongly required by + some of very-high-reliable systems. + The implementation of this infrastructure is highly depend on arch, + bus system, chipset and so on. + Currentry, very few drivers on few arch actually implements this. + + If you don't know what to do here, say N. + source "drivers/pci/Kconfig" source "drivers/pci/hotplug/Kconfig" Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c =================================================================== --- /dev/null +++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c @@ -0,0 +1,30 @@ +/* + * File: iomap_check.c + * Purpose: Implement the IA64 specific iomap recovery interfaces + */ + +#include + +void iochk_init(void); +void iochk_clear(iocookie *cookie, struct pci_dev *dev); +int iochk_read(iocookie *cookie); + +void iochk_init(void) +{ + /* setup */ +} + +void iochk_clear(iocookie *cookie, struct pci_dev *dev) +{ + /* register device etc. */ +} + +int iochk_read(iocookie *cookie) +{ + /* check error etc. */ + + return 0; +} + +EXPORT_SYMBOL(iochk_read); +EXPORT_SYMBOL(iochk_clear); Index: linux-2.6.13-rc1/include/asm-ia64/io.h =================================================================== --- linux-2.6.13-rc1.orig/include/asm-ia64/io.h +++ linux-2.6.13-rc1/include/asm-ia64/io.h @@ -70,6 +70,19 @@ extern unsigned int num_io_spaces; #include #include #include + +#ifdef CONFIG_IOMAP_CHECK + +/* ia64 iocookie */ +typedef struct { + int dummy; +} iocookie; + +/* Enable ia64 iochk - See arch/ia64/lib/iomap_check.c */ +#define HAVE_ARCH_IOMAP_CHECK + +#endif /* CONFIG_IOMAP_CHECK */ + #include /* From seto.hidetoshi at jp.fujitsu.com Wed Jul 6 15:04:14 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Wed, 06 Jul 2005 14:04:14 +0900 Subject: [PATCH 2.6.13-rc1 03/10] IOCHK interface for I/O error handling/detecting In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com> References: <42CB63B2.6000505@jp.fujitsu.com> Message-ID: <42CB664E.1050003@jp.fujitsu.com> [This is 3 of 10 patches, "iochk-03-register.patch"] - Implement ia64 version of basic codes: iochk_clear, iochk_read, iochk_init, and iocookie The direction is: - Have a "now in check" global list, "iochk_devices", for future use. - Take a lock, "iochk_lock", to protect the global list. - iochk_clear packs *dev into iocookie, and add it to the global list. After all prepared, clear error-flag in cookie to start io-critical-session. - iochk_read checks error-flag and device's status register. After removing iocookie from list, return the result. This is too simple. We need more codes... See next (4 of 10). Changes from previous one for 2.6.11.11: - trivial coding style fix. Signed-off-by: Hidetoshi Seto --- arch/ia64/lib/iomap_check.c | 55 ++++++++++++++++++++++++++++++++++++++++++-- include/asm-ia64/io.h | 5 +++- 2 files changed, 57 insertions(+), 3 deletions(-) Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c +++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c @@ -4,24 +4,75 @@ */ #include +#include +#include void iochk_init(void); void iochk_clear(iocookie *cookie, struct pci_dev *dev); int iochk_read(iocookie *cookie); +struct list_head iochk_devices; +DEFINE_SPINLOCK(iochk_lock); /* all works are excluded on this lock */ + +static int have_error(struct pci_dev *dev); + void iochk_init(void) { /* setup */ + INIT_LIST_HEAD(&iochk_devices); } void iochk_clear(iocookie *cookie, struct pci_dev *dev) { - /* register device etc. */ + unsigned long flag; + + INIT_LIST_HEAD(&(cookie->list)); + + cookie->dev = dev; + + spin_lock_irqsave(&iochk_lock, flag); + list_add(&cookie->list, &iochk_devices); + spin_unlock_irqrestore(&iochk_lock, flag); + + cookie->error = 0; } int iochk_read(iocookie *cookie) { - /* check error etc. */ + unsigned long flag; + int ret = 0; + + spin_lock_irqsave(&iochk_lock, flag); + if (cookie->error || have_error(cookie->dev)) + ret = 1; + list_del(&cookie->list); + spin_unlock_irqrestore(&iochk_lock, flag); + + return ret; +} + +static int have_error(struct pci_dev *dev) +{ + u16 status; + + /* check status */ + switch (dev->hdr_type) { + case PCI_HEADER_TYPE_NORMAL: /* 0 */ + pci_read_config_word(dev, PCI_STATUS, &status); + break; + case PCI_HEADER_TYPE_BRIDGE: /* 1 */ + pci_read_config_word(dev, PCI_SEC_STATUS, &status); + break; + case PCI_HEADER_TYPE_CARDBUS: /* 2 */ + return 0; /* FIX ME */ + default: + BUG(); + } + + if ( (status & PCI_STATUS_REC_TARGET_ABORT) + || (status & PCI_STATUS_REC_MASTER_ABORT) + || (status & PCI_STATUS_DETECTED_PARITY) ) + return 1; return 0; } Index: linux-2.6.13-rc1/include/asm-ia64/io.h =================================================================== --- linux-2.6.13-rc1.orig/include/asm-ia64/io.h +++ linux-2.6.13-rc1/include/asm-ia64/io.h @@ -72,10 +72,13 @@ extern unsigned int num_io_spaces; #include #ifdef CONFIG_IOMAP_CHECK +#include /* ia64 iocookie */ typedef struct { - int dummy; + struct list_head list; + struct pci_dev *dev; /* target device */ + unsigned long error; /* error flag */ } iocookie; /* Enable ia64 iochk - See arch/ia64/lib/iomap_check.c */ From seto.hidetoshi at jp.fujitsu.com Wed Jul 6 15:07:39 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Wed, 06 Jul 2005 14:07:39 +0900 Subject: [PATCH 2.6.13-rc1 04/10] IOCHK interface for I/O error handling/detecting In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com> References: <42CB63B2.6000505@jp.fujitsu.com> Message-ID: <42CB671B.5000604@jp.fujitsu.com> [This is 4 of 10 patches, "iochk-04-register_bridge.patch"] - Since there could be a (PCI-)bus-error, some kind of error cannot detected on the device but on its hosting bridge. So, it is also required to check the bridge's register. In other words, to check a bus-error correctly, we need to check both end of the bus, device and its host bridge. OK, but often bridges are shared by multiple devices, right? So we need care to handle it... Yes, see next (5 of 10). Changes from previous one for 2.6.11.11: - trivial coding style fix. Signed-off-by: Hidetoshi Seto --- arch/ia64/lib/iomap_check.c | 20 +++++++++++++++++++- include/asm-ia64/io.h | 1 + 2 files changed, 20 insertions(+), 1 deletion(-) Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c +++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c @@ -14,6 +14,7 @@ int iochk_read(iocookie *cookie); struct list_head iochk_devices; DEFINE_SPINLOCK(iochk_lock); /* all works are excluded on this lock */ +static struct pci_dev *search_host_bridge(struct pci_dev *dev); static int have_error(struct pci_dev *dev); void iochk_init(void) @@ -29,6 +30,7 @@ void iochk_clear(iocookie *cookie, struc INIT_LIST_HEAD(&(cookie->list)); cookie->dev = dev; + cookie->host = search_host_bridge(dev); spin_lock_irqsave(&iochk_lock, flag); list_add(&cookie->list, &iochk_devices); @@ -43,7 +45,8 @@ int iochk_read(iocookie *cookie) int ret = 0; spin_lock_irqsave(&iochk_lock, flag); - if (cookie->error || have_error(cookie->dev)) + if ( cookie->error || have_error(cookie->dev) + || (cookie->host && have_error(cookie->host)) ) ret = 1; list_del(&cookie->list); spin_unlock_irqrestore(&iochk_lock, flag); @@ -51,6 +54,21 @@ int iochk_read(iocookie *cookie) return ret; } +struct pci_dev *search_host_bridge(struct pci_dev *dev) +{ + struct pci_bus *pbus; + + /* there is no bridge */ + if (!dev->bus->self) + return NULL; + + /* find root bus bridge */ + for (pbus = dev->bus; pbus->parent && pbus->parent->self; + pbus = pbus->parent); + + return pbus->self; +} + static int have_error(struct pci_dev *dev) { u16 status; Index: linux-2.6.13-rc1/include/asm-ia64/io.h =================================================================== --- linux-2.6.13-rc1.orig/include/asm-ia64/io.h +++ linux-2.6.13-rc1/include/asm-ia64/io.h @@ -78,6 +78,7 @@ extern unsigned int num_io_spaces; typedef struct { struct list_head list; struct pci_dev *dev; /* target device */ + struct pci_dev *host; /* hosting bridge */ unsigned long error; /* error flag */ } iocookie; From seto.hidetoshi at jp.fujitsu.com Wed Jul 6 15:11:42 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Wed, 06 Jul 2005 14:11:42 +0900 Subject: [PATCH 2.6.13-rc1 05/10] IOCHK interface for I/O error handling/detecting In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com> References: <42CB63B2.6000505@jp.fujitsu.com> Message-ID: <42CB680E.2010103@jp.fujitsu.com> [This is 5 of 10 patches, "iochk-05-check_bridge.patch"] - Consider three devices, A, B, and C are placed under a same host bridge H. After A and B checked-in (=passed iochk_clear, doing some I/Os, not come to call iochk_read yet), now C is going to check-in, just entered iochk_clear, but C finds out that H indicates error. It means that A or B hits a bus error, but there is no data which one actually hits the error. So, C should notify the error to both of A and B, and clear the H's status to start its own I/Os. If there are only two devices, it become more simple. It is clear if one find a bridge error while another is check-in, the error is nothing except for another's. Well, works concerning registers (devices and bridges) are almost shaped up. So, from next, I'll move to deep phase to implement more arch-specific codes... see next (6 of 10). Changes from previous one for 2.6.11.11: - (non) Signed-off-by: Hidetoshi Seto --- arch/ia64/lib/iomap_check.c | 45 ++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 45 insertions(+) Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c +++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c @@ -17,6 +17,9 @@ DEFINE_SPINLOCK(iochk_lock); /* all work static struct pci_dev *search_host_bridge(struct pci_dev *dev); static int have_error(struct pci_dev *dev); +void notify_bridge_error(struct pci_dev *bridge); +void clear_bridge_error(struct pci_dev *bridge); + void iochk_init(void) { /* setup */ @@ -33,6 +36,11 @@ void iochk_clear(iocookie *cookie, struc cookie->host = search_host_bridge(dev); spin_lock_irqsave(&iochk_lock, flag); + if (cookie->host && have_error(cookie->host)) { + /* someone under my bridge causes error... */ + notify_bridge_error(cookie->host); + clear_bridge_error(cookie->host); + } list_add(&cookie->list, &iochk_devices); spin_unlock_irqrestore(&iochk_lock, flag); @@ -95,5 +103,42 @@ static int have_error(struct pci_dev *de return 0; } +void notify_bridge_error(struct pci_dev *bridge) +{ + iocookie *cookie; + + if (list_empty(&iochk_devices)) + return; + + /* notify error to all transactions using this host bridge */ + if (bridge) { + /* local notify, ex. Parity, Abort etc. */ + list_for_each_entry(cookie, &iochk_devices, list) { + if (cookie->host == bridge) + cookie->error = 1; + } + } +} + +void clear_bridge_error(struct pci_dev *bridge) +{ + u16 status = ( PCI_STATUS_REC_TARGET_ABORT + | PCI_STATUS_REC_MASTER_ABORT + | PCI_STATUS_DETECTED_PARITY ); + + /* clear bridge status */ + switch (bridge->hdr_type) { + case PCI_HEADER_TYPE_NORMAL: /* 0 */ + pci_write_config_word(bridge, PCI_STATUS, status); + break; + case PCI_HEADER_TYPE_BRIDGE: /* 1 */ + pci_write_config_word(bridge, PCI_SEC_STATUS, status); + break; + case PCI_HEADER_TYPE_CARDBUS: /* 2 */ + default: + BUG(); + } +} + EXPORT_SYMBOL(iochk_read); EXPORT_SYMBOL(iochk_clear); From seto.hidetoshi at jp.fujitsu.com Wed Jul 6 15:14:07 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Wed, 06 Jul 2005 14:14:07 +0900 Subject: [PATCH 2.6.13-rc1 06/10] IOCHK interface for I/O error handling/detecting In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com> References: <42CB63B2.6000505@jp.fujitsu.com> Message-ID: <42CB689F.6040208@jp.fujitsu.com> [This is 6 of 10 patches, "iochk-06-mcanotify.patch"] - This is a headache: When ia64 get a problem on hardware, OS could request SAL(System Abstraction Layer: ia64 firmware) to gather system status via calling SAL_GET_STATE_INFO procedure. However (depend on implementation of SAL for its platform, hopefully), on the way of gathering, SAL also checks every host bridges and its status, and after that, resets the state... So we should take care of this reset by SAL. Handling MCA(Machine Check Abort) is one of a situation should we take care. Originally MCA is designed as a critical interruption, so when MCA comes, without OS's order, SAL gathers system status before OS gets its control. So since states of bridges are already reset on entrance of MCA, OS should notify "lost of state" to all "check-in" contexts, by marking its error flag, iocookie->error. There would be better way if OS can know the bridge state from data which SAL gathered, but in the meanwhile, I just do simple way. PCI-parity error is one of MCA causes, is it OK? Next, "data poisoning" helps us... see next (7 of 10). Changes from previous one for 2.6.11.11: - (non) Signed-off-by: Hidetoshi Seto --- arch/ia64/kernel/mca.c | 13 +++++++++++++ arch/ia64/lib/iomap_check.c | 7 ++++++- 2 files changed, 19 insertions(+), 1 deletion(-) Index: linux-2.6.13-rc1/arch/ia64/kernel/mca.c =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/kernel/mca.c +++ linux-2.6.13-rc1/arch/ia64/kernel/mca.c @@ -77,6 +77,11 @@ #include #include +#ifdef CONFIG_IOMAP_CHECK +#include +extern void notify_bridge_error(struct pci_dev *bridge); +#endif + #if defined(IA64_MCA_DEBUG_INFO) # define IA64_MCA_DEBUG(fmt...) printk(fmt) #else @@ -893,6 +898,14 @@ ia64_mca_ucmc_handler(void) sal_log_record_header_t *rh = IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_MCA); rh->severity = sal_log_severity_corrected; ia64_sal_clear_state_info(SAL_INFO_TYPE_MCA); + +#ifdef CONFIG_IOMAP_CHECK + /* + * SAL already reads and clears error bits on bridge registers, + * so we should have all running transactions to retry. + */ + notify_bridge_error(0); +#endif } /* * Wakeup all the processors which are spinning in the rendezvous Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c +++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c @@ -111,7 +111,12 @@ void notify_bridge_error(struct pci_dev return; /* notify error to all transactions using this host bridge */ - if (bridge) { + if (!bridge) { + /* global notify, ex. MCA */ + list_for_each_entry(cookie, &iochk_devices, list) { + cookie->error = 1; + } + } else { /* local notify, ex. Parity, Abort etc. */ list_for_each_entry(cookie, &iochk_devices, list) { if (cookie->host == bridge) From seto.hidetoshi at jp.fujitsu.com Wed Jul 6 15:17:21 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Wed, 06 Jul 2005 14:17:21 +0900 Subject: [PATCH 2.6.13-rc1 07/10] IOCHK interface for I/O error handling/detecting In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com> References: <42CB63B2.6000505@jp.fujitsu.com> Message-ID: <42CB6961.2060508@jp.fujitsu.com> [This is 7 of 10 patches, "iochk-07-poison.patch"] - When bus-error occur on write, write data is broken on the bus, so target device gets broken data. There are 2 way for such device to take: - send PERR(Parity Error) to host, expecting immediate panic. - mark status register as error, expecting its driver to read it and decide to retry. So it is not difficult for drivers to recover from error on write if it can take latter way, and if it don't worry about taking time to wait completion of write. - When bus-error occur on read, read data is broken on the bus, so host bridge gets broken data. There are 2 way for such bridge to take: - send BERR(Bus Error) to host, expecting immediate panic. - mark data as "poisoned" and throw it to destination, expecting panic if system touched it but cannot stop data pollution. Former is traditional way, latter is modern way, called "data poisoning". The important difference is whether OS can get a chance to recover from the error. Usually, sending BERR doesn't tell us "where it comes", "who it orders", so we cannot do anything except panic. In the other hand, poisoned data will reach its destination and will cause a error on there again. Yes, destination is "where who lives". Well, the idea is quite simple: "driver checks read data, and recover if it was poisoned." Checking all read at once (ex. take a memo of all read addresses touched after iochk_clear and check them all in iochk_read) does not make sense. Practical way is check each read, keep its result, and read it at end. Touching poisoned data become a MCA, so now it directly means a system down. But since the MCA tells us "where it happens", we can recover it...? All right, let's see next (8 of 10). Changes from previous one for 2.6.11.11: - move barrier function macro into gcc_inirin.h. - could anyone write same barrier for intel compiler? Tony or David, could you help me? Signed-off-by: Hidetoshi Seto --- include/asm-ia64/gcc_intrin.h | 16 +++++++ include/asm-ia64/io.h | 96 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 112 insertions(+) Index: linux-2.6.13-rc1/include/asm-ia64/io.h =================================================================== --- linux-2.6.13-rc1.orig/include/asm-ia64/io.h +++ linux-2.6.13-rc1/include/asm-ia64/io.h @@ -189,6 +189,8 @@ __ia64_mk_io_addr (unsigned long port) * during optimization, which is why we use "volatile" pointers. */ +#ifdef CONFIG_IOMAP_CHECK + static inline unsigned int ___ia64_inb (unsigned long port) { @@ -197,6 +199,8 @@ ___ia64_inb (unsigned long port) ret = *addr; __ia64_mf_a(); + ia64_mca_barrier(ret); + return ret; } @@ -208,6 +212,8 @@ ___ia64_inw (unsigned long port) ret = *addr; __ia64_mf_a(); + ia64_mca_barrier(ret); + return ret; } @@ -219,9 +225,48 @@ ___ia64_inl (unsigned long port) ret = *addr; __ia64_mf_a(); + ia64_mca_barrier(ret); + + return ret; +} + +#else /* CONFIG_IOMAP_CHECK */ + +static inline unsigned int +___ia64_inb (unsigned long port) +{ + volatile unsigned char *addr = __ia64_mk_io_addr(port); + unsigned char ret; + + ret = *addr; + __ia64_mf_a(); + return ret; +} + +static inline unsigned int +___ia64_inw (unsigned long port) +{ + volatile unsigned short *addr = __ia64_mk_io_addr(port); + unsigned short ret; + + ret = *addr; + __ia64_mf_a(); return ret; } +static inline unsigned int +___ia64_inl (unsigned long port) +{ + volatile unsigned int *addr = __ia64_mk_io_addr(port); + unsigned int ret; + + ret = *addr; + __ia64_mf_a(); + return ret; +} + +#endif /* CONFIG_IOMAP_CHECK */ + static inline void ___ia64_outb (unsigned char val, unsigned long port) { @@ -338,6 +383,55 @@ __outsl (unsigned long port, const void * a good idea). Writes are ok though for all existing ia64 platforms (and * hopefully it'll stay that way). */ + +#ifdef CONFIG_IOMAP_CHECK + +static inline unsigned char +___ia64_readb (const volatile void __iomem *addr) +{ + unsigned char val; + + val = *(volatile unsigned char __force *)addr; + ia64_mca_barrier(val); + + return val; +} + +static inline unsigned short +___ia64_readw (const volatile void __iomem *addr) +{ + unsigned short val; + + val = *(volatile unsigned short __force *)addr; + ia64_mca_barrier(val); + + return val; +} + +static inline unsigned int +___ia64_readl (const volatile void __iomem *addr) +{ + unsigned int val; + + val = *(volatile unsigned int __force *) addr; + ia64_mca_barrier(val); + + return val; +} + +static inline unsigned long +___ia64_readq (const volatile void __iomem *addr) +{ + unsigned long val; + + val = *(volatile unsigned long __force *) addr; + ia64_mca_barrier(val); + + return val; +} + +#else /* CONFIG_IOMAP_CHECK */ + static inline unsigned char ___ia64_readb (const volatile void __iomem *addr) { @@ -362,6 +456,8 @@ ___ia64_readq (const volatile void __iom return *(volatile unsigned long __force *) addr; } +#endif /* CONFIG_IOMAP_CHECK */ + static inline void __writeb (unsigned char val, volatile void __iomem *addr) { Index: linux-2.6.13-rc1/include/asm-ia64/gcc_intrin.h =================================================================== --- linux-2.6.13-rc1.orig/include/asm-ia64/gcc_intrin.h +++ linux-2.6.13-rc1/include/asm-ia64/gcc_intrin.h @@ -598,4 +598,20 @@ do { \ :: "r"((x)) : "p6", "p7", "memory"); \ } while (0) +/* + * Some I/O bridges may poison the data read, instead of + * signaling a BERR. The consummation of poisoned data + * triggers a MCA, which tells us the polluted address. + * Note that the read operation by itself does not consume + * the bad data, you have to do something with it, e.g.: + * + * ld.8 r9=[r10];; // r10 == I/O address + * add.8 r8=r9,0;; // fake operation + */ +#define ia64_mca_barrier(val) \ +({ \ + register unsigned long gr8 asm("r8"); \ + asm volatile ("add %0=%1,r0" : "=r"(gr8) : "r"(val)); \ +}) + #endif /* _ASM_IA64_GCC_INTRIN_H */ From seto.hidetoshi at jp.fujitsu.com Wed Jul 6 15:18:53 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Wed, 06 Jul 2005 14:18:53 +0900 Subject: [PATCH 2.6.13-rc1 08/10] IOCHK interface for I/O error handling/detecting In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com> References: <42CB63B2.6000505@jp.fujitsu.com> Message-ID: <42CB69BD.1090607@jp.fujitsu.com> [This is 8 of 10 patches, "iochk-08-mcadrv.patch"] - Touching poisoned data become a MCA, so now it assumed as a fatal error, directly will be a system down. But since the MCA tells us a physical address - "where it happens", we can do some action to survive. If the address is present in resource of "check-in" device, it is guaranteed that its driver will call iochk_read in the very near future, and that now the driver have a ability and responsibility of recovery from the error. So if it was "check-in" address, what OS should do is mark "check-in" devices and just restart usual works. Soon the driver will notice the error and operate it properly. Note: We can identify a affected device, but because of SAL behavior (mentioned at 6 of 10), we need to mark all "check-in" devices. Fix in future, if possible. Changes from previous one for 2.6.11.11: - (non) Signed-off-by: Hidetoshi Seto --- arch/ia64/kernel/mca_drv.c | 84 ++++++++++++++++++++++++++++++++++++++++++++ arch/ia64/lib/iomap_check.c | 1 2 files changed, 85 insertions(+) Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c +++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c @@ -147,3 +147,4 @@ void clear_bridge_error(struct pci_dev * EXPORT_SYMBOL(iochk_read); EXPORT_SYMBOL(iochk_clear); +EXPORT_SYMBOL(iochk_devices); /* for MCA driver */ Index: linux-2.6.13-rc1/arch/ia64/kernel/mca_drv.c =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/kernel/mca_drv.c +++ linux-2.6.13-rc1/arch/ia64/kernel/mca_drv.c @@ -35,6 +35,12 @@ #include "mca_drv.h" +#ifdef CONFIG_IOMAP_CHECK +#include +#include +extern struct list_head iochk_devices; +#endif + /* max size of SAL error record (default) */ static int sal_rec_max = 10000; @@ -377,6 +383,79 @@ is_mca_global(peidx_table_t *peidx, pal_ return MCA_IS_GLOBAL; } +#ifdef CONFIG_IOMAP_CHECK + +/** + * get_target_identifier - get address of target_identifier + * @peidx: pointer of index of processor error section + * + * Return value: + * addr if valid / 0 if not valid + */ +static u64 get_target_identifier(peidx_table_t *peidx) +{ + sal_log_mod_error_info_t *smei; + + smei = peidx_bus_check(peidx, 0); + if (smei->valid.target_identifier) + return (smei->target_identifier); + return 0; +} + +/** + * offending_addr_in_check - Check if the addr is in checking resource. + * @addr: address offending this MCA + * + * Return value: + * 1 if in / 0 if out + */ +static int offending_addr_in_check(u64 addr) +{ + int i; + struct pci_dev *tdev; + iocookie *cookie; + + if (list_empty(&iochk_devices)) + return 0; + + list_for_each_entry(cookie, &iochk_devices, list) { + tdev = cookie->dev; + for (i = 0; i < PCI_ROM_RESOURCE; i++) { + if (tdev->resource[i].start <= addr + && addr <= tdev->resource[i].end) + return 1; + if ((tdev->resource[i].flags + & (PCI_BASE_ADDRESS_SPACE|PCI_BASE_ADDRESS_MEM_TYPE_MASK)) + == (PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64)) + i++; + } + } + return 0; +} + +/** + * pci_error_recovery - Check if MCA occur on transaction in iochk. + * @peidx: pointer of index of processor error section + * + * Return value: + * 1 if error could be cought in driver / 0 if not + */ +static int pci_error_recovery(peidx_table_t *peidx) +{ + u64 addr; + + addr = get_target_identifier(peidx); + if (!addr) + return 0; + + if (offending_addr_in_check(addr)) + return 1; + + return 0; +} + +#endif /* CONFIG_IOMAP_CHECK */ + /** * recover_from_read_error - Try to recover the errors which type are "read"s. * @slidx: pointer of index of SAL error record @@ -399,6 +478,11 @@ recover_from_read_error(slidx_table_t *s if (!pbci->tv) return 0; +#ifdef CONFIG_IOMAP_CHECK + if (pci_error_recovery(peidx)) + return 1; +#endif + /* * cpu read or memory-mapped io read * From seto.hidetoshi at jp.fujitsu.com Wed Jul 6 15:20:15 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Wed, 06 Jul 2005 14:20:15 +0900 Subject: [PATCH 2.6.13-rc1 09/10] IOCHK interface for I/O error handling/detecting In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com> References: <42CB63B2.6000505@jp.fujitsu.com> Message-ID: <42CB6A0F.4080304@jp.fujitsu.com> [This is 9 of 10 patches, "iochk-09-cpeh.patch"] - SAL behavior doesn't affect only MCA. There are other chances to call SAL_GET_STATE_INFO, that's when CMC, CPE, and INIT is happen. - CMC(Corrected Machine Check) is for non-fatal, processor local errors. Fortunately, calling SAL_GET_STATE_INFO for CMC only collect data from a processor issued it, without touching any bridge and its status. So, this is safe. - CPE(Corrected Platform Error) is for non-fatal, platform related errors. Even it says corrected, but calling SAL procedure for CPE touchs every bridge on the platform, and "correct" bridge status that's bad for iochk works. - INIT is a kind of system reset request, as far as I know. So restarting from INIT is out of design, also iochk after INIT is not required at this time. In short, only MCA and CPE have the problem of SAL behavior. One of the difference from MCA is that SAL will not gather data before OS actually request it. MCA: 1) SAL gathers data and keep it internally 2) OS gets control 3) if OS requests, SAL returns data gathered at beginning. CPE: 1) OS gets control 2) OS request to SAL 3) SAL gathers data and return it to OS Therefore, we can make CPE handler to care bridge states, to check states before calling SAL procedure. Changes from previous one for 2.6.11.11: - (non) Signed-off-by: Hidetoshi Seto --- arch/ia64/kernel/mca.c | 21 +++++++++++++++++++++ arch/ia64/lib/iomap_check.c | 17 +++++++++++++++++ 2 files changed, 38 insertions(+) Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c +++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c @@ -19,6 +19,7 @@ static int have_error(struct pci_dev *de void notify_bridge_error(struct pci_dev *bridge); void clear_bridge_error(struct pci_dev *bridge); +void save_bridge_error(void); void iochk_init(void) { @@ -145,6 +146,22 @@ void clear_bridge_error(struct pci_dev * } } +void save_bridge_error(void) +{ + iocookie *cookie; + + if (list_empty(&iochk_devices)) + return; + + /* mark devices if its root bus bridge have errors */ + list_for_each_entry(cookie, &iochk_devices, list) { + if (cookie->error) + continue; + if (have_error(cookie->host)) + notify_bridge_error(cookie->host); + } +} + EXPORT_SYMBOL(iochk_read); EXPORT_SYMBOL(iochk_clear); EXPORT_SYMBOL(iochk_devices); /* for MCA driver */ Index: linux-2.6.13-rc1/arch/ia64/kernel/mca.c =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/kernel/mca.c +++ linux-2.6.13-rc1/arch/ia64/kernel/mca.c @@ -80,6 +80,8 @@ #ifdef CONFIG_IOMAP_CHECK #include extern void notify_bridge_error(struct pci_dev *bridge); +extern void save_bridge_error(void); +extern spinlock_t iochk_lock; #endif #if defined(IA64_MCA_DEBUG_INFO) @@ -288,11 +290,30 @@ ia64_mca_cpe_int_handler (int cpe_irq, v IA64_MCA_DEBUG("%s: received interrupt vector = %#x on CPU %d\n", __FUNCTION__, cpe_irq, smp_processor_id()); +#ifndef CONFIG_IOMAP_CHECK + /* SAL spec states this should run w/ interrupts enabled */ local_irq_enable(); /* Get the CPE error record and log it */ ia64_mca_log_sal_error_record(SAL_INFO_TYPE_CPE); +#else + /* + * Because SAL_GET_STATE_INFO for CPE might clear bridge states + * in process of gathering error information from the system, + * we should check the states before clearing it. + * While OS and SAL are handling bridge status, we have to protect + * the states from changing by any other I/Os running simultaneously, + * so this should be handled w/ lock and interrupts disabled. + */ + spin_lock(&iochk_lock); + save_bridge_error(); + ia64_mca_log_sal_error_record(SAL_INFO_TYPE_CPE); + spin_unlock(&iochk_lock); + + /* Rests can go w/ interrupt enabled as usual */ + local_irq_enable(); +#endif spin_lock(&cpe_history_lock); if (!cpe_poll_enabled && cpe_vector >= 0) { From seto.hidetoshi at jp.fujitsu.com Wed Jul 6 15:21:15 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Wed, 06 Jul 2005 14:21:15 +0900 Subject: [PATCH 2.6.13-rc1 10/10] IOCHK interface for I/O error handling/detecting In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com> References: <42CB63B2.6000505@jp.fujitsu.com> Message-ID: <42CB6A4B.9000906@jp.fujitsu.com> [This is 10 of 10 patches, "iochk-10-rwlock.patch"] - If a read access (i.g. readX/inX) cause a error while SAL gathers system data on other processor ,it could be happen a bridge error status is marked and vanished in a blink. In case of MCA, thanks to rz_always flag, all MCA are handled as global, so all processor except one is paused during its handling. But in case of CPE, as same as other interruption, it have to be handled beside of all other active processors. Therefore, to avoid such status crash, exclusive control between read access and SAL_GET_STATE_INFO is required. To realize this, I changed control lock from spin to rw. There would be better way, if so, this part should be replaced. Changes from previous one for 2.6.11.11: - (non) Signed-off-by: Hidetoshi Seto --- arch/ia64/kernel/mca.c | 6 +++--- arch/ia64/lib/iomap_check.c | 11 ++++++----- include/asm-ia64/io.h | 24 ++++++++++++++++++++++++ 3 files changed, 33 insertions(+), 8 deletions(-) Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c +++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c @@ -12,7 +12,7 @@ void iochk_clear(iocookie *cookie, struc int iochk_read(iocookie *cookie); struct list_head iochk_devices; -DEFINE_SPINLOCK(iochk_lock); /* all works are excluded on this lock */ +DEFINE_RWLOCK(iochk_lock); /* all works are excluded on this lock */ static struct pci_dev *search_host_bridge(struct pci_dev *dev); static int have_error(struct pci_dev *dev); @@ -36,14 +36,14 @@ void iochk_clear(iocookie *cookie, struc cookie->dev = dev; cookie->host = search_host_bridge(dev); - spin_lock_irqsave(&iochk_lock, flag); + write_lock_irqsave(&iochk_lock, flag); if (cookie->host && have_error(cookie->host)) { /* someone under my bridge causes error... */ notify_bridge_error(cookie->host); clear_bridge_error(cookie->host); } list_add(&cookie->list, &iochk_devices); - spin_unlock_irqrestore(&iochk_lock, flag); + write_unlock_irqrestore(&iochk_lock, flag); cookie->error = 0; } @@ -53,12 +53,12 @@ int iochk_read(iocookie *cookie) unsigned long flag; int ret = 0; - spin_lock_irqsave(&iochk_lock, flag); + write_lock_irqsave(&iochk_lock, flag); if ( cookie->error || have_error(cookie->dev) || (cookie->host && have_error(cookie->host)) ) ret = 1; list_del(&cookie->list); - spin_unlock_irqrestore(&iochk_lock, flag); + write_unlock_irqrestore(&iochk_lock, flag); return ret; } @@ -162,6 +162,7 @@ void save_bridge_error(void) } } +EXPORT_SYMBOL(iochk_lock); EXPORT_SYMBOL(iochk_read); EXPORT_SYMBOL(iochk_clear); EXPORT_SYMBOL(iochk_devices); /* for MCA driver */ Index: linux-2.6.13-rc1/include/asm-ia64/io.h =================================================================== --- linux-2.6.13-rc1.orig/include/asm-ia64/io.h +++ linux-2.6.13-rc1/include/asm-ia64/io.h @@ -73,6 +73,7 @@ extern unsigned int num_io_spaces; #ifdef CONFIG_IOMAP_CHECK #include +#include /* ia64 iocookie */ typedef struct { @@ -82,6 +83,8 @@ typedef struct { unsigned long error; /* error flag */ } iocookie; +extern rwlock_t iochk_lock; /* see arch/ia64/lib/iomap_check.c */ + /* Enable ia64 iochk - See arch/ia64/lib/iomap_check.c */ #define HAVE_ARCH_IOMAP_CHECK @@ -196,10 +199,13 @@ ___ia64_inb (unsigned long port) { volatile unsigned char *addr = __ia64_mk_io_addr(port); unsigned char ret; + unsigned long flags; + read_lock_irqsave(&iochk_lock,flags); ret = *addr; __ia64_mf_a(); ia64_mca_barrier(ret); + read_unlock_irqrestore(&iochk_lock,flags); return ret; } @@ -209,10 +215,13 @@ ___ia64_inw (unsigned long port) { volatile unsigned short *addr = __ia64_mk_io_addr(port); unsigned short ret; + unsigned long flags; + read_lock_irqsave(&iochk_lock,flags); ret = *addr; __ia64_mf_a(); ia64_mca_barrier(ret); + read_unlock_irqrestore(&iochk_lock,flags); return ret; } @@ -222,10 +231,13 @@ ___ia64_inl (unsigned long port) { volatile unsigned int *addr = __ia64_mk_io_addr(port); unsigned int ret; + unsigned long flags; + read_lock_irqsave(&iochk_lock,flags); ret = *addr; __ia64_mf_a(); ia64_mca_barrier(ret); + read_unlock_irqrestore(&iochk_lock,flags); return ret; } @@ -390,9 +402,12 @@ static inline unsigned char ___ia64_readb (const volatile void __iomem *addr) { unsigned char val; + unsigned long flags; + read_lock_irqsave(&iochk_lock,flags); val = *(volatile unsigned char __force *)addr; ia64_mca_barrier(val); + read_unlock_irqrestore(&iochk_lock,flags); return val; } @@ -401,9 +416,12 @@ static inline unsigned short ___ia64_readw (const volatile void __iomem *addr) { unsigned short val; + unsigned long flags; + read_lock_irqsave(&iochk_lock,flags); val = *(volatile unsigned short __force *)addr; ia64_mca_barrier(val); + read_unlock_irqrestore(&iochk_lock,flags); return val; } @@ -412,9 +430,12 @@ static inline unsigned int ___ia64_readl (const volatile void __iomem *addr) { unsigned int val; + unsigned long flags; + read_lock_irqsave(&iochk_lock,flags); val = *(volatile unsigned int __force *) addr; ia64_mca_barrier(val); + read_unlock_irqrestore(&iochk_lock,flags); return val; } @@ -423,9 +444,12 @@ static inline unsigned long ___ia64_readq (const volatile void __iomem *addr) { unsigned long val; + unsigned long flags; + read_lock_irqsave(&iochk_lock,flags); val = *(volatile unsigned long __force *) addr; ia64_mca_barrier(val); + read_unlock_irqrestore(&iochk_lock,flags); return val; } Index: linux-2.6.13-rc1/arch/ia64/kernel/mca.c =================================================================== --- linux-2.6.13-rc1.orig/arch/ia64/kernel/mca.c +++ linux-2.6.13-rc1/arch/ia64/kernel/mca.c @@ -81,7 +81,7 @@ #include extern void notify_bridge_error(struct pci_dev *bridge); extern void save_bridge_error(void); -extern spinlock_t iochk_lock; +extern rwlock_t iochk_lock; #endif #if defined(IA64_MCA_DEBUG_INFO) @@ -306,10 +306,10 @@ ia64_mca_cpe_int_handler (int cpe_irq, v * the states from changing by any other I/Os running simultaneously, * so this should be handled w/ lock and interrupts disabled. */ - spin_lock(&iochk_lock); + write_lock(&iochk_lock); save_bridge_error(); ia64_mca_log_sal_error_record(SAL_INFO_TYPE_CPE); - spin_unlock(&iochk_lock); + write_unlock(&iochk_lock); /* Rests can go w/ interrupt enabled as usual */ local_irq_enable(); From yoshfuji at linux-ipv6.org Wed Jul 6 16:26:27 2005 From: yoshfuji at linux-ipv6.org (YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) Date: Wed, 06 Jul 2005 15:26:27 +0900 (JST) Subject: [PATCH 2.6.13-rc1 01/10] IOCHK interface for I/O error handling/detecting In-Reply-To: <42CB63B2.6000505@jp.fujitsu.com> References: <42CB63B2.6000505@jp.fujitsu.com> Message-ID: <20050706.152627.68274440.yoshfuji@linux-ipv6.org> In article <42CB63B2.6000505 at jp.fujitsu.com> (at Wed, 06 Jul 2005 13:53:06 +0900), Hidetoshi Seto says: > Index: linux-2.6.13-rc1/lib/iomap.c > =================================================================== > --- linux-2.6.13-rc1.orig/lib/iomap.c > +++ linux-2.6.13-rc1/lib/iomap.c > @@ -230,3 +230,9 @@ void pci_iounmap(struct pci_dev *dev, vo > } > EXPORT_SYMBOL(pci_iomap); > EXPORT_SYMBOL(pci_iounmap); > + > +#ifndef HAVE_ARCH_IOMAP_CHECK > +/* Since generic funcs are inlined and defined in header, just export */ > +EXPORT_SYMBOL(iochk_clear); > +EXPORT_SYMBOL(iochk_read); > +#endif > Index: linux-2.6.13-rc1/include/asm-generic/iomap.h > =================================================================== > --- linux-2.6.13-rc1.orig/include/asm-generic/iomap.h > +++ linux-2.6.13-rc1/include/asm-generic/iomap.h : > + */ > +#ifdef HAVE_ARCH_IOMAP_CHECK > +extern void iochk_init(void); > +extern void iochk_clear(iocookie *cookie, struct pci_dev *dev); > +extern int iochk_read(iocookie *cookie); > +#else > +static inline void iochk_init(void) {} > +static inline void iochk_clear(iocookie *cookie, struct pci_dev *dev) {} > +static inline int iochk_read(iocookie *cookie) { return 0; } > +#endif > + > #endif It looks strange to me. You cannot export "static inline" functions. You can export iochk_{init,clear,read} only if HAVE_ARCH_IOMAP_CHECK is defined. --yoshfuji From seto.hidetoshi at jp.fujitsu.com Wed Jul 6 20:15:02 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Wed, 06 J