From nathanl at austin.ibm.com Tue Nov 2 13:47:22 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Mon, 01 Nov 2004 20:47:22 -0600 Subject: [patch] mmu_context_init needs to run earlier Message-ID: <1099363642.22996.345.camel@pants.austin.ibm.com> Hi- I am seeing "kernel BUG in mmu_context_init at arch/ppc64/mm/init.c:528" in latest 2.6 bk kernels. It looks as if arch_initcall is not early enough for mmu_context_init -- I inserted printk's in that function and init_new_context, and indeed, init_new_context is being called before mmu_context_init. Not sure this is the best fix, or that this completely eliminates the races, but I didn't see any other obvious solution. Boot-tested on a p630. Signed-off-by: Nathan Lynch --- diff -puN arch/ppc64/mm/init.c~ppc64-make-mmu_context_init-core_initcall arch/ppc64/mm/init.c --- linux-2.6.10-rc1-bk11/arch/ppc64/mm/init.c~ppc64-make-mmu_context_init-core_initcall 2004-11-01 19:51:46.000000000 -0600 +++ linux-2.6.10-rc1-bk11-nathanl/arch/ppc64/mm/init.c 2004-11-01 19:53:24.000000000 -0600 @@ -529,7 +529,7 @@ static int __init mmu_context_init(void) return 0; } -arch_initcall(mmu_context_init); +core_initcall(mmu_context_init); /* * Do very early mm setup. _ From benh at kernel.crashing.org Tue Nov 2 15:47:23 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 02 Nov 2004 15:47:23 +1100 Subject: [patch] mmu_context_init needs to run earlier In-Reply-To: <1099363642.22996.345.camel@pants.austin.ibm.com> References: <1099363642.22996.345.camel@pants.austin.ibm.com> Message-ID: <1099370843.29689.448.camel@gaston> On Mon, 2004-11-01 at 20:47 -0600, Nathan Lynch wrote: > Hi- > > I am seeing "kernel BUG in mmu_context_init at arch/ppc64/mm/init.c:528" > in latest 2.6 bk kernels. It looks as if arch_initcall is not early > enough for mmu_context_init -- I inserted printk's in that function and > init_new_context, and indeed, init_new_context is being called before > mmu_context_init. > > Not sure this is the best fix, or that this completely eliminates the > races, but I didn't see any other obvious solution. Boot-tested on a > p630. Do you have a backtrace of who is trying to get a context that early ? If it's some call of usermode helpers, I doubt it's very sane to do that before the arch initcalls have run ! It would be interesting to know who is triggering it. Ben. From nathanl at austin.ibm.com Tue Nov 2 16:59:19 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Mon, 01 Nov 2004 23:59:19 -0600 Subject: [patch] mmu_context_init needs to run earlier In-Reply-To: <1099370843.29689.448.camel@gaston> References: <1099363642.22996.345.camel@pants.austin.ibm.com> <1099370843.29689.448.camel@gaston> Message-ID: <1099375159.9590.4.camel@localhost.localdomain> On Tue, 2004-11-02 at 15:47 +1100, Benjamin Herrenschmidt wrote: > Do you have a backtrace of who is trying to get a context that > early ? > > If it's some call of usermode helpers, I doubt it's very sane to do that > before the arch initcalls have run ! It would be interesting to know who > is triggering it. > Sure, I inserted a WARN_ON in init_new_context. I see several of these before hitting the BUG_ON in mmu_context_init. I'm assuming these are from the driver core trying to run /sbin/hotplug. Badness in init_new_context at arch/ppc64/mm/init.c:483 Call Trace: [c00000000ff7fa00] [c00000000ff7faa0] 0xc00000000ff7faa0 (unreliable) [c00000000ff7faa0] [c0000000000c96dc] .do_execve+0xdc/0x2ac [c00000000ff7fb60] [c000000000016790] .sys_execve+0x7c/0x104 [c00000000ff7fc00] [c000000000011b80] syscall_exit+0x0/0x18 --- Exception: c01 at .____call_usermodehelper+0xcc/0xf8 LR = .____call_usermodehelper+0x9c/0xf8 Nathan From benh at kernel.crashing.org Tue Nov 2 17:05:11 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 02 Nov 2004 17:05:11 +1100 Subject: [patch] mmu_context_init needs to run earlier In-Reply-To: <1099375159.9590.4.camel@localhost.localdomain> References: <1099363642.22996.345.camel@pants.austin.ibm.com> <1099370843.29689.448.camel@gaston> <1099375159.9590.4.camel@localhost.localdomain> Message-ID: <1099375511.29693.463.camel@gaston> On Mon, 2004-11-01 at 23:59 -0600, Nathan Lynch wrote: > On Tue, 2004-11-02 at 15:47 +1100, Benjamin Herrenschmidt wrote: > > Do you have a backtrace of who is trying to get a context that > > early ? > > > > If it's some call of usermode helpers, I doubt it's very sane to do that > > before the arch initcalls have run ! It would be interesting to know who > > is triggering it. > > > > Sure, I inserted a WARN_ON in init_new_context. I see several of these > before hitting the BUG_ON in mmu_context_init. I'm assuming these are > from the driver core trying to run /sbin/hotplug. Yah. It would be interesting to find out who is triggering those calls (what drivers are probed that early during boot). It doesn't happen on my g5 for some reason. Ben. From nathanl at austin.ibm.com Wed Nov 3 08:55:33 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Tue, 02 Nov 2004 15:55:33 -0600 Subject: [patch] mmu_context_init needs to run earlier In-Reply-To: <1099375511.29693.463.camel@gaston> References: <1099363642.22996.345.camel@pants.austin.ibm.com> <1099370843.29689.448.camel@gaston> <1099375159.9590.4.camel@localhost.localdomain> <1099375511.29693.463.camel@gaston> Message-ID: <1099432532.23845.90.camel@pants.austin.ibm.com> On Tue, 2004-11-02 at 00:05, Benjamin Herrenschmidt wrote: > On Mon, 2004-11-01 at 23:59 -0600, Nathan Lynch wrote: > > On Tue, 2004-11-02 at 15:47 +1100, Benjamin Herrenschmidt wrote: > > > Do you have a backtrace of who is trying to get a context that > > > early ? > > > > > > If it's some call of usermode helpers, I doubt it's very sane to do that > > > before the arch initcalls have run ! It would be interesting to know who > > > is triggering it. > > > > > > > Sure, I inserted a WARN_ON in init_new_context. I see several of these > > before hitting the BUG_ON in mmu_context_init. I'm assuming these are > > from the driver core trying to run /sbin/hotplug. > > Yah. It would be interesting to find out who is triggering those calls > (what drivers are probed that early during boot). It doesn't happen on > my g5 for some reason. Ok, here's a boot log with kobject debugging turned on. I can't interpret all of this but I believe a couple of them are due to sysdev_class_register for cpus and nodes. I don't know why I'm the only person running into this -- maybe it's something to do with my turning on every possible debug option ;) Regardless, I've got a better patch (I think) for the mmu context thing on the way. checking if image is initramfs... it is Freeing initrd memory: 1939k freed subsystem devices: registering kobject devices: registering. parent: , set: subsystem bus: registering kobject bus: registering. parent: , set: subsystem class: registering kobject class: registering. parent: , set: subsystem firmware: registering kobject firmware: registering. parent: , set: kobject platform: registering. parent: , set: devices subsystem platform: registering kobject platform: registering. parent: , set: bus kobject_hotplug fill_kobj_path: path = '/bus/platform' kobject_hotplug: /sbin/hotplug bus seq=1 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/bus/platform SUBSYSTEM=bus kobject_hotplug - call_usermodehelper returned -1 kobject devices: registering. parent: platform, set: kobject_hotplug fill_kobj_path: path = '/bus/platform/devices' kobject_hotplug: /sbin/hotplug bus seq=2 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/bus/platform/devices SUBSYSTEM=bus kobject_hotplug - call_usermodehelper returned -1 kobject drivers: registering. parent: platform, set: kobject_hotplug fill_kobj_path: path = '/bus/platform/drivers' kobject_hotplug: /sbin/hotplug bus seq=3 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/bus/platform/drivers SUBSYSTEM=bus kobject_hotplug - call_usermodehelper returned -1 subsystem system: registering kobject system: registering. parent: devices, set: kobject cpu: registering. parent: , set: system kobject_hotplug fill_kobj_path: path = '/devices/system/cpu' kobject_hotplug: /sbin/hotplug system seq=4 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/devices/system/cpu SUBSYSTEM=system kobject_hotplug - call_usermodehelper returned -1 subsystem kernel: registering kobject kernel: registering. parent: , set: NET: Registered protocol family 16 subsystem of_platform: registering kobject of_platform: registering. parent: , set: bus kobject_hotplug fill_kobj_path: path = '/bus/of_platform' kobject_hotplug: /sbin/hotplug bus seq=5 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/bus/of_platform SUBSYSTEM=bus kobject devices: registering. parent: of_platform, set: kobject_hotplug fill_kobj_path: path = '/bus/of_platform/devices' kobject_hotplug: /sbin/hotplug bus seq=6 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/bus/of_platform/devices SUBSYSTEM=bus kobject drivers: registering. parent: of_platform, set: kobject_hotplug fill_kobj_path: path = '/bus/of_platform/drivers' kobject_hotplug: /sbin/hotplug bus seq=7 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/bus/of_platform/drivers SUBSYSTEM=bus subsystem pci_bus: registering kobject pci_bus: registering. parent: , set: class kobject_hotplug fill_kobj_path: path = '/class/pci_bus' kobject_hotplug: /sbin/hotplug class seq=8 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/class/pci_bus SUBSYSTEM=class subsystem pci: registering kobject pci: registering. parent: , set: bus kobject_hotplug fill_kobj_path: path = '/bus/pci' kobject_hotplug: /sbin/hotplug bus seq=9 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/bus/pci SUBSYSTEM=bus kobject devices: registering. parent: pci, set: kobject_hotplug fill_kobj_path: path = '/bus/pci/devices' kobject_hotplug: /sbin/hotplug bus seq=10 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/bus/pci/devices SUBSYSTEM=bus kobject drivers: registering. parent: pci, set: kobject_hotplug fill_kobj_path: path = '/bus/pci/drivers' kobject_hotplug: /sbin/hotplug bus seq=11 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/bus/pci/drivers SUBSYSTEM=bus subsystem tty: registering kobject tty: registering. parent: , set: class kobject_hotplug fill_kobj_path: path = '/class/tty' kobject_hotplug: /sbin/hotplug class seq=12 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/class/tty SUBSYSTEM=class kobject node: registering. parent: , set: system kobject_hotplug fill_kobj_path: path = '/devices/system/node' kobject_hotplug: /sbin/hotplug system seq=13 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/devices/system/node SUBSYSTEM=system kernel BUG in mmu_context_init at arch/ppc64/mm/init.c:528! cpu 0x1: Vector: 700 (Program Check) at [c0000000047cbb60] pc: c000000000439fa4: .mmu_context_init+0x4c/0x68 lr: c000000000439f8c: .mmu_context_init+0x34/0x68 sp: c0000000047cbde0 msr: 9000000000029032 current = 0xc0000001fe78d7f0 paca = 0xc0000000004cdd00 pid = 1, comm = swapper enter ? for help 1:mon> From benh at kernel.crashing.org Wed Nov 3 09:22:43 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 03 Nov 2004 09:22:43 +1100 Subject: [patch] mmu_context_init needs to run earlier In-Reply-To: <1099432532.23845.90.camel@pants.austin.ibm.com> References: <1099363642.22996.345.camel@pants.austin.ibm.com> <1099370843.29689.448.camel@gaston> <1099375159.9590.4.camel@localhost.localdomain> <1099375511.29693.463.camel@gaston> <1099432532.23845.90.camel@pants.austin.ibm.com> Message-ID: <1099434163.20294.15.camel@gaston> > Ok, here's a boot log with kobject debugging turned on. I can't > interpret all of this but I believe a couple of them are due to > sysdev_class_register for cpus and nodes. > > I don't know why I'm the only person running into this -- maybe it's > something to do with my turning on every possible debug option ;) > > Regardless, I've got a better patch (I think) for the mmu context thing > on the way. Ok, it's all of the platform stuff etc... I suppose you run into that because you actually have an initramfs with an /sbin/hotplug in it, do you ? Some other ppl experienced it, you aren't the only one ;) Ben. From nathanl at austin.ibm.com Wed Nov 3 09:46:33 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Tue, 02 Nov 2004 16:46:33 -0600 Subject: [patch] mmu_context_init needs to run earlier In-Reply-To: <1099434163.20294.15.camel@gaston> References: <1099363642.22996.345.camel@pants.austin.ibm.com> <1099370843.29689.448.camel@gaston> <1099375159.9590.4.camel@localhost.localdomain> <1099375511.29693.463.camel@gaston> <1099432532.23845.90.camel@pants.austin.ibm.com> <1099434163.20294.15.camel@gaston> Message-ID: <1099435593.23845.96.camel@pants.austin.ibm.com> On Tue, 2004-11-02 at 16:22, Benjamin Herrenschmidt wrote: > > Ok, here's a boot log with kobject debugging turned on. I can't > > interpret all of this but I believe a couple of them are due to > > sysdev_class_register for cpus and nodes. > > > > I don't know why I'm the only person running into this -- maybe it's > > something to do with my turning on every possible debug option ;) > > > > Regardless, I've got a better patch (I think) for the mmu context thing > > on the way. > > Ok, it's all of the platform stuff etc... I suppose you run into that > because you actually have an initramfs with an /sbin/hotplug in it, do > you ? Some other ppl experienced it, you aren't the only one ;) Right, I'm using initrd. It makes sense now, thanks. Nathan From linas at austin.ibm.com Wed Nov 3 10:06:18 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 2 Nov 2004 17:06:18 -0600 Subject: [PATCH] iommu fixes, round 3 In-Reply-To: <1098998916.692.20.camel@sinatra.austin.ibm.com> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> <1098813781.32293.40.camel@sinatra.austin.ibm.com> <16768.10849.741580.850491@cargo.ozlabs.ibm.com> <1098998916.692.20.camel@sinatra.austin.ibm.com> Message-ID: <20041102230618.GQ10026@austin.ibm.com> On Thu, Oct 28, 2004 at 04:28:36PM -0500, John Rose was heard to remark: > This patch changes the following iommu-related things: > > - Renames the [i,p]series versions of iommu_devnode_init(), to keep things > logically separate where possible. > > - Moves iommu_free_table() to generic iommu.c > > - Creates of_cleanup_node(), which will directly precede the dynamic removal of > any device node > > Comments welcome. FYI, without this patch, I get BUG_ON crashes when I hotplug-remove a PCI card, on the nov. 1 2.6.10-rc1 kernel. The BUG_ON is in free_pages, called from iommu_free_table() called from of_remove_node() With this patch, things get back to normal. Please forward & apply. --linas From benh at kernel.crashing.org Wed Nov 3 15:18:51 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 03 Nov 2004 15:18:51 +1100 Subject: [PATCH] iommu fixes, round 3 In-Reply-To: <1098998916.692.20.camel@sinatra.austin.ibm.com> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> <1098813781.32293.40.camel@sinatra.austin.ibm.com> <16768.10849.741580.850491@cargo.ozlabs.ibm.com> <1098998916.692.20.camel@sinatra.austin.ibm.com> Message-ID: <1099455531.31630.35.camel@gaston> On Thu, 2004-10-28 at 16:28 -0500, John Rose wrote: > This patch changes the following iommu-related things: > > - Renames the [i,p]series versions of iommu_devnode_init(), to keep things > logically separate where possible. > > - Moves iommu_free_table() to generic iommu.c > > - Creates of_cleanup_node(), which will directly precede the dynamic removal of > any device node Hrm... one thing I'm still annoyed with is that you are still calling of_cleanup_node() from within of_remove_node(). That call should be moved to the caller. Ben. From sfr at canb.auug.org.au Wed Nov 3 18:21:28 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 3 Nov 2004 18:21:28 +1100 Subject: [PATCH] PPC64 iSeries iommu cleanups Message-ID: <20041103182128.6d1a7d3a.sfr@canb.auug.org.au> Hi Andrew, This patch just does some cleanups of iSeries_iommu.c remove lots of unneeded includes use list_for_each_entry white space formatting No semantic changes. Signed-off-by: Stephen Rothwell Please apply and send to Linus. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk/arch/ppc64/kernel/iSeries_iommu.c linus-bk-iommu.1/arch/ppc64/kernel/iSeries_iommu.c --- linus-bk/arch/ppc64/kernel/iSeries_iommu.c 2004-04-13 09:25:09.000000000 +1000 +++ linus-bk-iommu.1/arch/ppc64/kernel/iSeries_iommu.c 2004-11-02 18:24:31.000000000 +1100 @@ -25,30 +25,14 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#include -#include #include -#include -#include -#include -#include -#include #include -#include -#include -#include -#include +#include -#include -#include #include -#include -#include - #include - -#include "pci.h" - +#include +#include extern struct list_head iSeries_Global_Device_List; @@ -76,12 +60,11 @@ tce.te_bits.tb_pciwr = 1; } - rc = HvCallXm_setTce((u64)tbl->it_index, - (u64)index, - tce.te_word); + rc = HvCallXm_setTce((u64)tbl->it_index, (u64)index, + tce.te_word); if (rc) - panic("PCI_DMA: HvCallXm_setTce failed, Rc: 0x%lx\n", rc); - + panic("PCI_DMA: HvCallXm_setTce failed, Rc: 0x%lx\n", + rc); index++; uaddr += PAGE_SIZE; } @@ -90,20 +73,14 @@ static void tce_free_iSeries(struct iommu_table *tbl, long index, long npages) { u64 rc; - union tce_entry tce; while (npages--) { - tce.te_word = 0; - rc = HvCallXm_setTce((u64)tbl->it_index, - (u64)index, - tce.te_word); - + rc = HvCallXm_setTce((u64)tbl->it_index, (u64)index, 0); if (rc) - panic("PCI_DMA: HvCallXm_setTce failed, Rc: 0x%lx\n", rc); - + panic("PCI_DMA: HvCallXm_setTce failed, Rc: 0x%lx\n", + rc); index++; } - } @@ -115,17 +92,14 @@ { struct iSeries_Device_Node *dp; - for (dp = (struct iSeries_Device_Node *)iSeries_Global_Device_List.next; - dp != (struct iSeries_Device_Node *)&iSeries_Global_Device_List; - dp = (struct iSeries_Device_Node *)dp->Device_List.next) - if (dp->iommu_table != NULL && - dp->iommu_table->it_type == TCE_PCI && - dp->iommu_table->it_offset == tbl->it_offset && - dp->iommu_table->it_index == tbl->it_index && - dp->iommu_table->it_size == tbl->it_size) + list_for_each_entry(dp, &iSeries_Global_Device_List, Device_List) { + if ((dp->iommu_table != NULL) && + (dp->iommu_table->it_type == TCE_PCI) && + (dp->iommu_table->it_offset == tbl->it_offset) && + (dp->iommu_table->it_index == tbl->it_index) && + (dp->iommu_table->it_size == tbl->it_size)) return dp->iommu_table; - - + } return NULL; } @@ -143,15 +117,14 @@ { struct iommu_table_cb *parms; - parms = (struct iommu_table_cb*)kmalloc(sizeof(*parms), GFP_KERNEL); - + parms = kmalloc(sizeof(*parms), GFP_KERNEL); if (parms == NULL) panic("PCI_DMA: TCE Table Allocation failed."); memset(parms, 0, sizeof(*parms)); - parms->itc_busno = ISERIES_BUS(dn); - parms->itc_slotno = dn->LogicalSlot; + parms->itc_busno = ISERIES_BUS(dn); + parms->itc_slotno = dn->LogicalSlot; parms->itc_virtbus = 0; HvCallXm_getTceTableParms(ISERIES_HV_ADDR(parms)); @@ -159,34 +132,32 @@ if (parms->itc_size == 0) panic("PCI_DMA: parms->size is zero, parms is 0x%p", parms); - tbl->it_size = parms->itc_size; - tbl->it_busno = parms->itc_busno; - tbl->it_offset = parms->itc_offset; - tbl->it_index = parms->itc_index; - tbl->it_entrysize = sizeof(union tce_entry); - tbl->it_blocksize = 1; - tbl->it_type = TCE_PCI; + tbl->it_size = parms->itc_size; + tbl->it_busno = parms->itc_busno; + tbl->it_offset = parms->itc_offset; + tbl->it_index = parms->itc_index; + tbl->it_entrysize = sizeof(union tce_entry); + tbl->it_blocksize = 1; + tbl->it_type = TCE_PCI; kfree(parms); } -void iommu_devnode_init(struct iSeries_Device_Node *dn) { +void iommu_devnode_init(struct iSeries_Device_Node *dn) +{ struct iommu_table *tbl; - tbl = (struct iommu_table *)kmalloc(sizeof(struct iommu_table), GFP_KERNEL); + tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL); iommu_table_getparms(dn, tbl); /* Look for existing tce table */ dn->iommu_table = iommu_table_find(tbl); - if (dn->iommu_table == NULL) dn->iommu_table = iommu_init_table(tbl); else kfree(tbl); - - return; } -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041103/abedb587/attachment.pgp From brking at us.ibm.com Thu Nov 4 02:10:19 2004 From: brking at us.ibm.com (brking at us.ibm.com) Date: Wed, 03 Nov 2004 09:10:19 -0600 Subject: [PATCH 1/2] ppc64: Block config accesses during BIST (revised - resend) Message-ID: <200411031510.iA3FAK7t022615@d01av03.pok.ibm.com> Resending... Some PCI adapters on pSeries and iSeries hardware (ipr scsi adapters) have an exposure today in that they issue BIST to the adapter to reset the card. If, during the time it takes to complete BIST, userspace attempts to access PCI config space, the host bus bridge will master abort the access since the ipr adapter does not respond on the PCI bus for a brief period of time when running BIST. This master abort results in the host PCI bridge isolating that PCI device from the rest of the system, making the device unusable until Linux is rebooted. This patch is an attempt to close that exposure by introducing some blocking code in the arch specific PCI code. The intent is to have the ipr device driver invoke these routines to prevent userspace PCI accesses from occurring during this window. It has been tested by running BIST on an ipr adapter while running a script which looped reading the config space of that adapter through sysfs. Without the patch, an EEH error occurrs. With the patch there is no EEH error. Tested on Power 5 and iSeries Power 4. Signed-off-by: Brian King --- linux-2.6.10-rc1-bk13-bjking1/arch/ppc64/kernel/iSeries_pci.c | 128 +++++++++- linux-2.6.10-rc1-bk13-bjking1/arch/ppc64/kernel/pSeries_pci.c | 103 +++++++- linux-2.6.10-rc1-bk13-bjking1/include/asm-ppc64/iSeries/iSeries_pci.h | 1 linux-2.6.10-rc1-bk13-bjking1/include/asm-ppc64/pci.h | 6 linux-2.6.10-rc1-bk13-bjking1/include/asm-ppc64/prom.h | 4 5 files changed, 226 insertions(+), 16 deletions(-) diff -puN include/asm-ppc64/prom.h~ppc64_block_cfg_io_during_bist include/asm-ppc64/prom.h --- linux-2.6.10-rc1-bk13/include/asm-ppc64/prom.h~ppc64_block_cfg_io_during_bist 2004-11-03 08:52:08.000000000 -0600 +++ linux-2.6.10-rc1-bk13-bjking1/include/asm-ppc64/prom.h 2004-11-03 08:52:08.000000000 -0600 @@ -183,11 +183,15 @@ extern struct device_node *of_chosen; /* flag descriptions */ #define OF_STALE 0 /* node is slated for deletion */ #define OF_DYNAMIC 1 /* node and properties were allocated via kmalloc */ +#define OF_NO_CFGIO 2 /* config space accesses should fail */ #define OF_IS_STALE(x) test_bit(OF_STALE, &x->_flags) #define OF_MARK_STALE(x) set_bit(OF_STALE, &x->_flags) #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, &x->_flags) #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, &x->_flags) +#define OF_IS_CFGIO_BLOCKED(x) test_bit(OF_NO_CFGIO, &x->_flags) +#define OF_UNBLOCK_CFGIO(x) clear_bit(OF_NO_CFGIO, &x->_flags) +#define OF_BLOCK_CFGIO(x) set_bit(OF_NO_CFGIO, &x->_flags) /* * Until 32-bit ppc can add proc_dir_entries to its device_node diff -puN arch/ppc64/kernel/pSeries_pci.c~ppc64_block_cfg_io_during_bist arch/ppc64/kernel/pSeries_pci.c --- linux-2.6.10-rc1-bk13/arch/ppc64/kernel/pSeries_pci.c~ppc64_block_cfg_io_during_bist 2004-11-03 08:52:08.000000000 -0600 +++ linux-2.6.10-rc1-bk13-bjking1/arch/ppc64/kernel/pSeries_pci.c 2004-11-03 08:52:08.000000000 -0600 @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -52,18 +53,17 @@ static int ibm_read_pci_config; static int ibm_write_pci_config; static int s7a_workaround; +static spinlock_t config_lock = SPIN_LOCK_UNLOCKED; extern unsigned long pci_probe_only; extern struct mpic *pSeries_mpic; -static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val) +static int __rtas_read_config(struct device_node *dn, int where, int size, u32 *val) { int returnval = -1; unsigned long buid, addr; int ret; - if (!dn) - return PCIBIOS_DEVICE_NOT_FOUND; if (where & (size - 1)) return PCIBIOS_BAD_REGISTER_NUMBER; @@ -87,6 +87,23 @@ static int rtas_read_config(struct devic return PCIBIOS_SUCCESSFUL; } +static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val) +{ + unsigned long flags; + int ret = 0; + + if (!dn) + return PCIBIOS_DEVICE_NOT_FOUND; + + spin_lock_irqsave(&config_lock, flags); + if (OF_IS_CFGIO_BLOCKED(dn)) + *val = -1; + else + ret = __rtas_read_config(dn, where, size, val); + spin_unlock_irqrestore(&config_lock, flags); + return ret; +} + static int rtas_pci_read_config(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val) @@ -105,13 +122,11 @@ static int rtas_pci_read_config(struct p return PCIBIOS_DEVICE_NOT_FOUND; } -static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +static int __rtas_write_config(struct device_node *dn, int where, int size, u32 val) { unsigned long buid, addr; int ret; - if (!dn) - return PCIBIOS_DEVICE_NOT_FOUND; if (where & (size - 1)) return PCIBIOS_BAD_REGISTER_NUMBER; @@ -129,6 +144,21 @@ static int rtas_write_config(struct devi return PCIBIOS_SUCCESSFUL; } +static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +{ + unsigned long flags; + int ret = 0; + + if (!dn) + return PCIBIOS_DEVICE_NOT_FOUND; + + spin_lock_irqsave(&config_lock, flags); + if (!OF_IS_CFGIO_BLOCKED(dn)) + ret = __rtas_write_config(dn, where, size, val); + spin_unlock_irqrestore(&config_lock, flags); + return ret; +} + static int rtas_pci_write_config(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val) @@ -152,6 +182,67 @@ struct pci_ops rtas_pci_ops = { rtas_pci_write_config }; +/** + * pci_block_config_io - Block PCI config reads/writes + * @pdev: pci device struct + * + * This function blocks any PCI config accesses from occurring. + * Device drivers may call this prior to running BIST if the + * adapter cannot handle PCI config reads or writes when + * running BIST. When blocked, any writes will be ignored and + * treated as successful and any reads will return all 1's data. + * + * Return value: + * nothing + **/ +void pci_block_config_io(struct pci_dev *pdev) +{ + struct device_node *dn = pci_device_to_OF_node(pdev); + unsigned long flags; + + spin_lock_irqsave(&config_lock, flags); + OF_BLOCK_CFGIO(dn); + spin_unlock_irqrestore(&config_lock, flags); +} +EXPORT_SYMBOL(pci_block_config_io); + +/** + * pci_unblock_config_io - Unblock PCI config reads/writes + * @pdev: pci device struct + * + * This function allows PCI config accesses to resume. + * + * Return value: + * nothing + **/ +void pci_unblock_config_io(struct pci_dev *pdev) +{ + struct device_node *dn = pci_device_to_OF_node(pdev); + unsigned long flags; + + spin_lock_irqsave(&config_lock, flags); + OF_UNBLOCK_CFGIO(dn); + spin_unlock_irqrestore(&config_lock, flags); +} +EXPORT_SYMBOL(pci_unblock_config_io); + +/** + * pci_start_bist - Start BIST on a PCI device + * @pdev: pci device struct + * + * This function allows a device driver to start BIST + * when PCI config accesses are disabled. + * + * Return value: + * nothing + **/ +int pci_start_bist(struct pci_dev *pdev) +{ + struct device_node *dn = pci_device_to_OF_node(pdev); + return __rtas_write_config(dn, PCI_BIST, 1, PCI_BIST_START); +} +EXPORT_SYMBOL(pci_start_bist); + static void python_countermeasures(unsigned long addr) { void __iomem *chip_regs; diff -puN include/asm-ppc64/pci.h~ppc64_block_cfg_io_during_bist include/asm-ppc64/pci.h --- linux-2.6.10-rc1-bk13/include/asm-ppc64/pci.h~ppc64_block_cfg_io_during_bist 2004-11-03 08:52:08.000000000 -0600 +++ linux-2.6.10-rc1-bk13-bjking1/include/asm-ppc64/pci.h 2004-11-03 08:52:08.000000000 -0600 @@ -244,6 +244,12 @@ extern int pci_read_irq_line(struct pci_ extern void pcibios_add_platform_entries(struct pci_dev *dev); +extern void pci_block_config_io(struct pci_dev *dev); + +extern void pci_unblock_config_io(struct pci_dev *dev); + +extern int pci_start_bist(struct pci_dev *dev); + #endif /* __KERNEL__ */ #endif /* __PPC64_PCI_H */ diff -puN include/asm-ppc64/iSeries/iSeries_pci.h~ppc64_block_cfg_io_during_bist include/asm-ppc64/iSeries/iSeries_pci.h --- linux-2.6.10-rc1-bk13/include/asm-ppc64/iSeries/iSeries_pci.h~ppc64_block_cfg_io_during_bist 2004-11-03 08:52:08.000000000 -0600 +++ linux-2.6.10-rc1-bk13-bjking1/include/asm-ppc64/iSeries/iSeries_pci.h 2004-11-03 08:52:08.000000000 -0600 @@ -91,6 +91,7 @@ struct iSeries_Device_Node { int ReturnCode; /* Return Code Holder */ int IoRetry; /* Current Retry Count */ int Flags; /* Possible flags(disable/bist)*/ +#define ISERIES_CFGIO_BLOCKED 1 u16 Vendor; /* Vendor ID */ u8 LogicalSlot; /* Hv Slot Index for Tces */ struct iommu_table* iommu_table;/* Device TCE Table */ diff -puN arch/ppc64/kernel/iSeries_pci.c~ppc64_block_cfg_io_during_bist arch/ppc64/kernel/iSeries_pci.c --- linux-2.6.10-rc1-bk13/arch/ppc64/kernel/iSeries_pci.c~ppc64_block_cfg_io_during_bist 2004-11-03 08:52:08.000000000 -0600 +++ linux-2.6.10-rc1-bk13-bjking1/arch/ppc64/kernel/iSeries_pci.c 2004-11-03 08:52:08.000000000 -0600 @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -77,6 +78,7 @@ static int Pci_Retry_Max = 3; /* Only re static int Pci_Error_Flag = 1; /* Set Retry Error on. */ static struct pci_ops iSeries_pci_ops; +static spinlock_t config_lock = SPIN_LOCK_UNLOCKED; /* * Table defines @@ -603,16 +605,12 @@ static u64 hv_cfg_write_func[4] = { /* * Read PCI config space */ -static int iSeries_pci_read_config(struct pci_bus *bus, unsigned int devfn, +static int __iSeries_pci_read_config(struct iSeries_Device_Node *node, int offset, int size, u32 *val) { - struct iSeries_Device_Node *node = find_Device_Node(bus->number, devfn); u64 fn; struct HvCallPci_LoadReturn ret; - if (node == NULL) - return PCIBIOS_DEVICE_NOT_FOUND; - fn = hv_cfg_read_func[(size - 1) & 3]; HvCall3Ret16(fn, &ret, node->DsaAddr.DsaAddr, offset, 0); @@ -625,20 +623,36 @@ static int iSeries_pci_read_config(struc return 0; } +static int iSeries_pci_read_config(struct pci_bus *bus, unsigned int devfn, + int offset, int size, u32 *val) +{ + struct iSeries_Device_Node *node = find_Device_Node(bus->number, devfn); + int ret = PCIBIOS_DEVICE_NOT_FOUND; + unsigned long flags; + + if (node) { + ret = 0; + spin_lock_irqsave(&config_lock, flags); + if (node->Flags & ISERIES_CFGIO_BLOCKED) + *val = -1; + else + ret = __iSeries_pci_read_config(node, offset, size, val); + spin_unlock_irqrestore(&config_lock, flags); + } + + return ret; +} + /* * Write PCI config space */ -static int iSeries_pci_write_config(struct pci_bus *bus, unsigned int devfn, +static int __iSeries_pci_write_config(struct iSeries_Device_Node *node, int offset, int size, u32 val) { - struct iSeries_Device_Node *node = find_Device_Node(bus->number, devfn); u64 fn; u64 ret; - if (node == NULL) - return PCIBIOS_DEVICE_NOT_FOUND; - fn = hv_cfg_write_func[(size - 1) & 3]; ret = HvCall4(fn, node->DsaAddr.DsaAddr, offset, val, 0); @@ -648,6 +662,23 @@ static int iSeries_pci_write_config(stru return 0; } +static int iSeries_pci_write_config(struct pci_bus *bus, unsigned int devfn, + int offset, int size, u32 val) +{ + struct iSeries_Device_Node *node = find_Device_Node(bus->number, devfn); + int ret = PCIBIOS_DEVICE_NOT_FOUND; + unsigned long flags; + + if (node) { + spin_lock_irqsave(&config_lock, flags); + if (!(node->Flags & ISERIES_CFGIO_BLOCKED)) + ret = __iSeries_pci_write_config(node, offset, size, val); + spin_unlock_irqrestore(&config_lock, flags); + } + + return ret; +} + static struct pci_ops iSeries_pci_ops = { .read = iSeries_pci_read_config, .write = iSeries_pci_write_config @@ -906,3 +937,80 @@ void iSeries_Write_Long(u32 data, volati } while (CheckReturnCode("WWL", DevNode, rc) != 0); } EXPORT_SYMBOL(iSeries_Write_Long); + +/** + * pci_block_config_io - Block PCI config reads/writes + * @pdev: pci device struct + * + * This function blocks any PCI config accesses from occurring. + * Device drivers may call this prior to running BIST if the + * adapter cannot handle PCI config reads or writes when + * running BIST. When blocked, any writes will be ignored and + * treated as successful and any reads will return all 1's data. + * + * Return value: + * nothing + **/ +void pci_block_config_io(struct pci_dev *pdev) +{ + struct iSeries_Device_Node *node; + unsigned long flags; + + node = find_Device_Node(pdev->bus->number, pdev->devfn); + + if (node == NULL) + return; + + spin_lock_irqsave(&config_lock, flags); + node->Flags |= ISERIES_CFGIO_BLOCKED; + spin_unlock_irqrestore(&config_lock, flags); +} +EXPORT_SYMBOL(pci_block_config_io); + +/** + * pci_unblock_config_io - Unblock PCI config reads/writes + * @pdev: pci device struct + * + * This function allows PCI config accesses to resume. + * + * Return value: + * nothing + **/ +void pci_unblock_config_io(struct pci_dev *pdev) +{ + struct iSeries_Device_Node *node; + unsigned long flags; + + node = find_Device_Node(pdev->bus->number, pdev->devfn); + + if (node == NULL) + return; + + spin_lock_irqsave(&config_lock, flags); + node->Flags &= ~ISERIES_CFGIO_BLOCKED; + spin_unlock_irqrestore(&config_lock, flags); +} +EXPORT_SYMBOL(pci_unblock_config_io); + +/** + * pci_start_bist - Start BIST on a PCI device + * @pdev: pci device struct + * + * This function allows a device driver to start BIST + * when PCI config accesses are disabled. + * + * Return value: + * nothing + **/ +int pci_start_bist(struct pci_dev *pdev) +{ + struct iSeries_Device_Node *node; + + node = find_Device_Node(pdev->bus->number, pdev->devfn); + + if (node == NULL) + return PCIBIOS_DEVICE_NOT_FOUND; + + return __iSeries_pci_write_config(node, PCI_BIST, 1, PCI_BIST_START); +} +EXPORT_SYMBOL(pci_start_bist); _ From brking at us.ibm.com Thu Nov 4 02:10:26 2004 From: brking at us.ibm.com (brking at us.ibm.com) Date: Wed, 03 Nov 2004 09:10:26 -0600 Subject: [PATCH 2/2] ipr_block_config_io_during_bist (resend) Message-ID: <200411031510.iA3FAR2u010150@d03av02.boulder.ibm.com> Change ipr to use new ppc64 pci APIs to block PCI config space accesses when running BIST to prevent PCI master aborts. Signed-off-by: Brian King --- linux-2.6.10-rc1-bk13-bjking1/drivers/scsi/ipr.c | 5 ++++- linux-2.6.10-rc1-bk13-bjking1/drivers/scsi/ipr.h | 7 +++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff -puN drivers/scsi/ipr.c~ipr_block_config_io_during_bist drivers/scsi/ipr.c --- linux-2.6.10-rc1-bk13/drivers/scsi/ipr.c~ipr_block_config_io_during_bist 2004-11-03 09:08:27.000000000 -0600 +++ linux-2.6.10-rc1-bk13-bjking1/drivers/scsi/ipr.c 2004-11-03 09:08:27.000000000 -0600 @@ -4935,6 +4935,7 @@ static int ipr_reset_restore_cfg_space(s int rc; ENTER; + pci_unblock_config_io(ioa_cfg->pdev); rc = pci_restore_state(ioa_cfg->pdev); if (rc != PCIBIOS_SUCCESSFUL) { @@ -4989,9 +4990,11 @@ static int ipr_reset_start_bist(struct i int rc; ENTER; - rc = pci_write_config_byte(ioa_cfg->pdev, PCI_BIST, PCI_BIST_START); + pci_block_config_io(ioa_cfg->pdev); + rc = pci_start_bist(ioa_cfg->pdev); if (rc != PCIBIOS_SUCCESSFUL) { + pci_unblock_config_io(ioa_cfg->pdev); ipr_cmd->ioasa.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR); rc = IPR_RC_JOB_CONTINUE; } else { diff -puN drivers/scsi/ipr.h~ipr_block_config_io_during_bist drivers/scsi/ipr.h --- linux-2.6.10-rc1-bk13/drivers/scsi/ipr.h~ipr_block_config_io_during_bist 2004-11-03 09:08:27.000000000 -0600 +++ linux-2.6.10-rc1-bk13-bjking1/drivers/scsi/ipr.h 2004-11-03 09:08:27.000000000 -0600 @@ -1112,6 +1112,13 @@ __FUNCTION__, __LINE__, ioa_cfg #define ipr_remove_dump_file(kobj, attr) do { } while(0) #endif +#if !defined(CONFIG_PPC_PSERIES) && !defined(CONFIG_PPC_ISERIES) +#define pci_block_config_io(dev) do { } while(0) +#define pci_unblock_config_io(dev) do { } while(0) +#define pci_start_bist(dev) \ + pci_write_config_byte(dev, PCI_BIST, PCI_BIST_START) +#endif + /* * Error logging macros */ _ From johnrose at austin.ibm.com Thu Nov 4 02:50:20 2004 From: johnrose at austin.ibm.com (John Rose) Date: Wed, 03 Nov 2004 09:50:20 -0600 Subject: [PATCH] iommu fixes, round 3 In-Reply-To: <1099455531.31630.35.camel@gaston> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> <1098813781.32293.40.camel@sinatra.austin.ibm.com> <16768.10849.741580.850491@cargo.ozlabs.ibm.com> <1098998916.692.20.camel@sinatra.austin.ibm.com> <1099455531.31630.35.camel@gaston> Message-ID: <1099497020.21421.0.camel@sinatra.austin.ibm.com> > > - Creates of_cleanup_node(), which will directly precede the dynamic removal of > > any device node > > Hrm... one thing I'm still annoyed with is that you are still calling > of_cleanup_node() from within of_remove_node(). That call should be > moved to the caller. :) Respectfully, I still disagree. The caller is a procfs-specific function related to an interface that we're hoping to deprecate soon. We want this to happen any time a node is removed, not anytime a node is removed using interface so-and-so. To me, it makes sense to put this here since of_add_node() calls of_finish_node_dynamic(), which creates the table. John From olof at austin.ibm.com Thu Nov 4 04:17:30 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 3 Nov 2004 11:17:30 -0600 Subject: [PATCH] PPC64 VIO iommu table property parsing wrong Message-ID: <20041103171730.GA31267@4> Andrew, please apply: With current firmware, the ibm,my-dma-window property now contains two panes for VSCSI server nodes. This breaks the current tests in the setup code. There's a bunch of references to pre-GA firmware bugs. That's a while ago, so we can remove the workarounds without breaking anyone. Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc64/kernel/vio.c | 19 +------------------ 1 files changed, 1 insertion(+), 18 deletions(-) diff -puN arch/ppc64/kernel/vio.c~vio-iommu arch/ppc64/kernel/vio.c --- linux-2.5/arch/ppc64/kernel/vio.c~vio-iommu 2004-11-03 09:50:29.829990236 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/vio.c 2004-11-03 10:12:07.313786376 -0600 @@ -521,24 +521,7 @@ static struct iommu_table * vio_build_io newTceTable = (struct iommu_table *) kmalloc(sizeof(struct iommu_table), GFP_KERNEL); - /* RPA docs say that #address-cells is always 1 for virtual - devices, but some older boxes' OF returns 2. This should - be removed by GA, unless there is legacy OFs that still - have 2 for #address-cells */ - size = ((dma_window[1+vio_num_address_cells] >> PAGE_SHIFT) << 3) - >> PAGE_SHIFT; - - /* This is just an ugly kludge. Remove as soon as the OF for all - machines actually follow the spec and encodes the offset field - as phys-encode (that is, #address-cells wide)*/ - if (dma_window_property_size == 12) { - size = ((dma_window[1] >> PAGE_SHIFT) << 3) >> PAGE_SHIFT; - } else if (dma_window_property_size == 20) { - size = ((dma_window[4] >> PAGE_SHIFT) << 3) >> PAGE_SHIFT; - } else { - printk(KERN_WARNING "vio_build_iommu_table: Invalid size of ibm,my-dma-window=%i, using 0x80 for size\n", dma_window_property_size); - size = 0x80; - } + size = ((dma_window[4] >> PAGE_SHIFT) << 3) >> PAGE_SHIFT; /* There should be some code to extract the phys-encoded offset using prom_n_addr_cells(). However, according to a comment _ From benh at kernel.crashing.org Thu Nov 4 09:15:00 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 04 Nov 2004 09:15:00 +1100 Subject: [PATCH] iommu fixes, round 3 In-Reply-To: <1099497020.21421.0.camel@sinatra.austin.ibm.com> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> <1098813781.32293.40.camel@sinatra.austin.ibm.com> <16768.10849.741580.850491@cargo.ozlabs.ibm.com> <1098998916.692.20.camel@sinatra.austin.ibm.com> <1099455531.31630.35.camel@gaston> <1099497020.21421.0.camel@sinatra.austin.ibm.com> Message-ID: <1099520100.31629.52.camel@gaston> > :) Respectfully, I still disagree. The caller is a procfs-specific function > related to an interface that we're hoping to deprecate soon. We want this to > happen any time a node is removed, not anytime a node is removed using > interface so-and-so. > > To me, it makes sense to put this here since of_add_node() calls > of_finish_node_dynamic(), which creates the table. I hate that interface... but I suppose we can merge the patch for now. I think this should be changed tho. It's no business of the low level device-tree manipulation functions to know about such things as iommu tables. And what will happen the day I remove the iommu table pointer from the struct device-node anyway ? If your interface to userland relies on that, then it's broken and will have to be reworked :( Maybe we can get away be creating a notifier mecanism for something in the kernel to get called back after nodes are beeing added and before they are beeing removed, that would be ok I suppose, but the low level tree manipulation has to stay separate. I do intend, in the long run, to remove all those additional fields we put in struct device-tree... Ben. From johnrose at austin.ibm.com Thu Nov 4 09:50:50 2004 From: johnrose at austin.ibm.com (John Rose) Date: Wed, 03 Nov 2004 16:50:50 -0600 Subject: [PATCH] iommu fixes, round 3 In-Reply-To: <1099520100.31629.52.camel@gaston> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> <1098813781.32293.40.camel@sinatra.austin.ibm.com> <16768.10849.741580.850491@cargo.ozlabs.ibm.com> <1098998916.692.20.camel@sinatra.austin.ibm.com> <1099455531.31630.35.camel@gaston> <1099497020.21421.0.camel@sinatra.austin.ibm.com> <1099520100.31629.52.camel@gaston> Message-ID: <1099522250.21421.22.camel@sinatra.austin.ibm.com> On Wed, 2004-11-03 at 16:15, Benjamin Herrenschmidt wrote: > And what will happen the day I remove the iommu table pointer > from the struct device-node anyway ? This would break the current table creation and management scheme, so some reworking would have to be done anyway. As for cleaning up struct device_node, you're preaching to the choir. How will the tables be associated with devices in the new case? > If your interface to userland relies on that, then it's broken and will > have to be reworked :( User-space DLPAR stuff doesn't care about these tables, or at what point they're freed, if that's what you mean. Thanks for looking at the patch, I'll take reluctant acceptance over nothing :) John From benh at kernel.crashing.org Thu Nov 4 09:51:47 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 04 Nov 2004 09:51:47 +1100 Subject: [PATCH] iommu fixes, round 3 In-Reply-To: <1099522250.21421.22.camel@sinatra.austin.ibm.com> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> <1098813781.32293.40.camel@sinatra.austin.ibm.com> <16768.10849.741580.850491@cargo.ozlabs.ibm.com> <1098998916.692.20.camel@sinatra.austin.ibm.com> <1099455531.31630.35.camel@gaston> <1099497020.21421.0.camel@sinatra.austin.ibm.com> <1099520100.31629.52.camel@gaston> <1099522250.21421.22.camel@sinatra.austin.ibm.com> Message-ID: <1099522307.31629.82.camel@gaston> On Wed, 2004-11-03 at 16:50 -0600, John Rose wrote: > On Wed, 2004-11-03 at 16:15, Benjamin Herrenschmidt wrote: > > > And what will happen the day I remove the iommu table pointer > > from the struct device-node anyway ? > > This would break the current table creation and management scheme, so > some reworking would have to be done anyway. As for cleaning up struct > device_node, you're preaching to the choir. How will the tables be > associated with devices in the new case? Some structure attached to the device, but not the device-node. But it's not there yet anyway, it's a long term goal. > > If your interface to userland relies on that, then it's broken and will > > have to be reworked :( > > User-space DLPAR stuff doesn't care about these tables, or at what point > they're freed, if that's what you mean. Thanks for looking at the > patch, I'll take reluctant acceptance over nothing :) Hehe, well, we need to fix the problem for now anyway. Ben. From anton at samba.org Thu Nov 4 19:10:04 2004 From: anton at samba.org (Anton Blanchard) Date: Thu, 4 Nov 2004 19:10:04 +1100 Subject: [PATCH] ppc64: Add option for oprofile to backtrace through spinlocks Message-ID: <20041104081003.GB5357@krispykreme.ozlabs.ibm.com> Hi, Now that spinlocks are always out of line, oprofile needs to backtrace through them. The following patch adds this but also adds the ability to turn it off (via the backtrace_spinlocks option in oprofilefs). The backout option is included because the backtracing here is best effort. On ppc64 the performance monitor exception is not an NMI, we get them only when interrupts are enabled. This means we can receive a profile hit that is inside a spinlock when our PC is somewhere completely different. In this patch we check to make sure the PC of the performance monitor exception as well as the current PC is inside the spinlock region. If so then we find the callers PC. If this is not true we play it safe and leave the tick inside the lock region. Also, now that we execute the SLB handler in real mode we have to adjust the address range that we consider as valid real mode addresses. Otherwise the SLB miss handler will end up as unknown kernel profile hits. Signed-off-by: Anton Blanchard diff -puN arch/ppc64/oprofile/op_model_power4.c~oprofile_backtrace arch/ppc64/oprofile/op_model_power4.c --- gr_work/arch/ppc64/oprofile/op_model_power4.c~oprofile_backtrace 2004-09-14 04:04:47.995524298 -0500 +++ gr_work-anton/arch/ppc64/oprofile/op_model_power4.c 2004-09-14 04:37:43.108261897 -0500 @@ -32,6 +32,13 @@ static u32 mmcr0_val; static u64 mmcr1_val; static u32 mmcra_val; +/* + * Since we do not have an NMI, backtracing through spinlocks is + * only a best guess. In light of this, allow it to be disabled at + * runtime. + */ +static int backtrace_spinlocks; + static void power4_reg_setup(struct op_counter_config *ctr, struct op_system_config *sys, int num_ctrs) @@ -59,6 +66,8 @@ static void power4_reg_setup(struct op_c mmcr1_val = sys->mmcr1; mmcra_val = sys->mmcra; + backtrace_spinlocks = sys->backtrace_spinlocks; + for (i = 0; i < num_counters; ++i) reset_value[i] = 0x80000000UL - ctr[i].count; @@ -170,19 +179,38 @@ static void __attribute_used__ kernel_un { } +static unsigned long check_spinlock_pc(struct pt_regs *regs, + unsigned long profile_pc) +{ + unsigned long pc = instruction_pointer(regs); + + /* + * If both the SIAR (sampled instruction) and the perfmon exception + * occurred in a spinlock region then we account the sample to the + * calling function. This isnt 100% correct, we really need soft + * IRQ disable so we always get the perfmon exception at the + * point at which the SIAR is set. + */ + if (backtrace_spinlocks && in_lock_functions(pc) && + in_lock_functions(profile_pc)) + return regs->link; + else + return profile_pc; +} + /* * On GQ and newer the MMCRA stores the HV and PR bits at the time * the SIAR was sampled. We use that to work out if the SIAR was sampled in * the hypervisor, our exception vectors or RTAS. */ -static unsigned long get_pc(void) +static unsigned long get_pc(struct pt_regs *regs) { unsigned long pc = mfspr(SPRN_SIAR); unsigned long mmcra; /* Cant do much about it */ if (!mmcra_has_sihv) - return pc; + return check_spinlock_pc(regs, pc); mmcra = mfspr(SPRN_MMCRA); @@ -196,10 +224,6 @@ static unsigned long get_pc(void) if (mmcra & MMCRA_SIPR) return pc; - /* Were we in our exception vectors? */ - if (pc < 0x4000UL) - return (unsigned long)__va(pc); - #ifdef CONFIG_PPC_PSERIES /* Were we in RTAS? */ if (pc >= rtas.base && pc < (rtas.base + rtas.size)) @@ -207,12 +231,16 @@ static unsigned long get_pc(void) return *((unsigned long *)rtas_bucket); #endif + /* Were we in our exception vectors or SLB real mode miss handler? */ + if (pc < 0x1000000UL) + return (unsigned long)__va(pc); + /* Not sure where we were */ if (pc < KERNELBASE) /* function descriptor madness */ return *((unsigned long *)kernel_unknown_bucket); - return pc; + return check_spinlock_pc(regs, pc); } static int get_kernel(unsigned long pc) @@ -239,7 +267,7 @@ static void power4_handle_interrupt(stru unsigned int cpu = smp_processor_id(); unsigned int mmcr0; - pc = get_pc(); + pc = get_pc(regs); is_kernel = get_kernel(pc); /* set the PMM bit (see comment below) */ diff -L op_model_power4.c -puN /dev/null /dev/null diff -puN arch/ppc64/oprofile/common.c~oprofile_backtrace arch/ppc64/oprofile/common.c --- gr_work/arch/ppc64/oprofile/common.c~oprofile_backtrace 2004-09-14 04:38:28.408023510 -0500 +++ gr_work-anton/arch/ppc64/oprofile/common.c 2004-09-14 04:40:18.825344482 -0500 @@ -112,11 +112,16 @@ static int op_ppc64_create_files(struct oprofilefs_create_ulong(sb, root, "enable_kernel", &sys.enable_kernel); oprofilefs_create_ulong(sb, root, "enable_user", &sys.enable_user); + oprofilefs_create_ulong(sb, root, "backtrace_spinlocks", + &sys.backtrace_spinlocks); /* Default to tracing both kernel and user */ sys.enable_kernel = 1; sys.enable_user = 1; + /* Turn on backtracing through spinlocks by default */ + sys.backtrace_spinlocks = 1; + return 0; } diff -puN arch/ppc64/oprofile/op_impl.h~oprofile_backtrace arch/ppc64/oprofile/op_impl.h --- gr_work/arch/ppc64/oprofile/op_impl.h~oprofile_backtrace 2004-09-14 04:38:59.694872442 -0500 +++ gr_work-anton/arch/ppc64/oprofile/op_impl.h 2004-09-14 04:39:17.624700077 -0500 @@ -71,6 +71,7 @@ struct op_system_config { unsigned long mmcra; unsigned long enable_kernel; unsigned long enable_user; + unsigned long backtrace_spinlocks; }; /* Per-arch configuration */ _ From anton at samba.org Fri Nov 5 02:50:32 2004 From: anton at samba.org (Anton Blanchard) Date: Fri, 5 Nov 2004 02:50:32 +1100 Subject: RTAS error log sequence numbers Message-ID: <20041104155032.GB1268@krispykreme.ozlabs.ibm.com> Hi, We can end up reusing RTAS error log sequence numbers - by calling log_error out of rtas_call before we have done nvram_init. eg on a p630 with a graphics card it doesnt like: RTAS: event: 1, Type: Internal Device Failure, Severity: 5 ... PCI: Probing PCI hardware RTAS: event: 2, Type: Internal Device Failure, Severity: 5 RTAS: event: 3, Type: Internal Device Failure, Severity: 5 RTAS: event: 4, Type: Internal Device Failure, Severity: 5 RTAS: event: 5, Type: Internal Device Failure, Severity: 5 RTAS: event: 6, Type: Internal Device Failure, Severity: 5 RTAS: event: 7, Type: Internal Device Failure, Severity: 5 RTAS: event: 8, Type: Internal Device Failure, Severity: 5 RTAS: event: 9, Type: Internal Device Failure, Severity: 5 RTAS: event: 10, Type: Internal Device Failure, Severity: 5 RTAS: event: 11, Type: Internal Device Failure, Severity: 5 RTAS: event: 12, Type: Internal Device Failure, Severity: 5 RTAS: event: 13, Type: Internal Device Failure, Severity: 5 RTAS: event: 14, Type: Internal Device Failure, Severity: 5 RTAS: event: 15, Type: Internal Device Failure, Severity: 5 RTAS: event: 16, Type: Internal Device Failure, Severity: 5 RTAS: event: 17, Type: Internal Device Failure, Severity: 5 RTAS: event: 18, Type: Internal Device Failure, Severity: 5 RTAS: event: 19, Type: Internal Device Failure, Severity: 5 RTAS: event: 20, Type: Internal Device Failure, Severity: 5 RTAS: event: 21, Type: Internal Device Failure, Severity: 5 RTAS: event: 22, Type: Internal Device Failure, Severity: 5 RTAS: event: 23, Type: Internal Device Failure, Severity: 5 RTAS: event: 24, Type: Internal Device Failure, Severity: 5 RTAS: event: 25, Type: Internal Device Failure, Severity: 5 RTAS: event: 26, Type: Internal Device Failure, Severity: 5 RTAS: event: 27, Type: Internal Device Failure, Severity: 5 RTAS: event: 28, Type: Internal Device Failure, Severity: 5 RTAS: event: 29, Type: Internal Device Failure, Severity: 5 RTAS: event: 30, Type: Internal Device Failure, Severity: 5 RTAS: event: 31, Type: Internal Device Failure, Severity: 5 RTAS: event: 32, Type: Internal Device Failure, Severity: 5 RTAS: event: 33, Type: Internal Device Failure, Severity: 5 RTAS: event: 34, Type: Internal Device Failure, Severity: 5 RTAS: event: 35, Type: Internal Device Failure, Severity: 5 RTAS: event: 36, Type: Internal Device Failure, Severity: 5 RTAS: event: 37, Type: Internal Device Failure, Severity: 5 ... RTAS daemon started RTAS: event: 42, Type: Unknown, Severity: 2 On reboot we get the same 1-37 error logs then the last one at 43. Maybe we dont care about persistent error log numbers but I thought Id check that the tools handle it OK. Anton From linas at austin.ibm.com Fri Nov 5 03:38:00 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 4 Nov 2004 10:38:00 -0600 Subject: RTAS error log sequence numbers In-Reply-To: <20041104155032.GB1268@krispykreme.ozlabs.ibm.com> References: <20041104155032.GB1268@krispykreme.ozlabs.ibm.com> Message-ID: <20041104163759.GR10026@austin.ibm.com> On Fri, Nov 05, 2004 at 02:50:32AM +1100, Anton Blanchard was heard to remark: > > Hi, > > We can end up reusing RTAS error log sequence numbers - by calling > log_error out of rtas_call before we have done nvram_init. eg on a p630 Curiously, nvram_init happens late in the boot sequence. I'm not sure why, other than the application of the principle "move things as late into the boot sequence as possible." > On reboot we get the same 1-37 error logs then the last one at 43. Maybe > we dont care about persistent error log numbers but I thought Id check > that the tools handle it OK. I assume you mean "unique error log numbers" that are monotinically increasing across boots. This would require moving nvram_init to very early in the boot sequence, since rtas errors can occur very early. I'll volunteer to do this shuffle, as long as there is no objection in principle. I don't have much of a feel for the pro's and con's of this. --linas From moilanen at austin.ibm.com Fri Nov 5 04:04:52 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Thu, 4 Nov 2004 11:04:52 -0600 Subject: RTAS error log sequence numbers In-Reply-To: <20041104163759.GR10026@austin.ibm.com> References: <20041104155032.GB1268@krispykreme.ozlabs.ibm.com> <20041104163759.GR10026@austin.ibm.com> Message-ID: <20041104110452.38f2152e@localhost> > > On reboot we get the same 1-37 error logs then the last one at 43. Maybe > > we dont care about persistent error log numbers but I thought Id check > > that the tools handle it OK. > > I assume you mean "unique error log numbers" that are monotinically > increasing across boots. This would require moving nvram_init to > very early in the boot sequence, since rtas errors can occur very early. > I'll volunteer to do this shuffle, as long as there is no objection > in principle. > > I don't have much of a feel for the pro's and con's of this. As long as nvram_init is called after pSeries/pmac_nvram_init, there should not be an issue. In fact you could just as easily call nvram_init() at the end of pSeries/pmac_nvram_init(). You also need to add a set of the error_log_cnt from nvram (and remove it from nvram_read_error_log). Currently we set the error_log_count when rtasd starts up, which may be after the first log_error. The user-level daemons (ELA and rtas_errd) were supposed to be able to handle duplicate sequence numbers since there are situations where we can not guarantee a unique sequence number. Jake From nfont at austin.ibm.com Fri Nov 5 04:08:14 2004 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Thu, 04 Nov 2004 11:08:14 -0600 Subject: RTAS error log sequence numbers In-Reply-To: <20041104163759.GR10026@austin.ibm.com> References: <20041104155032.GB1268@krispykreme.ozlabs.ibm.com> <20041104163759.GR10026@austin.ibm.com> Message-ID: <418A61FE.7030500@austin.ibm.com> Linas Vepstas wrote: > I assume you mean "unique error log numbers" that are monotinically > increasing across boots. This would require moving nvram_init to > very early in the boot sequence, since rtas errors can occur very early. > I'll volunteer to do this shuffle, as long as there is no objection > in principle. > > I don't have much of a feel for the pro's and con's of this. You would to also need to initialize the error log count by either reading the last RTAS event stored in nvram or starting the rtasd kernel daemon, before anyone calls log_error(). Moving this to earlier in the boot sequence would be nice but I'm not sure its worth the effort. Is there any way to garauntee that this is done vefore anyone calls log_error()? -Nathan F. From nfont at austin.ibm.com Fri Nov 5 03:34:38 2004 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Thu, 04 Nov 2004 10:34:38 -0600 Subject: RTAS error log sequence numbers In-Reply-To: <20041104155032.GB1268@krispykreme.ozlabs.ibm.com> References: <20041104155032.GB1268@krispykreme.ozlabs.ibm.com> Message-ID: <418A5A1E.5050102@austin.ibm.com> Anton Blanchard wrote: > On reboot we get the same 1-37 error logs then the last one at 43. Maybe > we dont care about persistent error log numbers but I thought Id check > that the tools handle it OK. Yes, the tools (namely rtas_errd) handle this just fine. The tools aren't reaaly concerned about the log number, its more for end users to track RTAS events. The error log count isn't initialized until the rtasd kernel daemon starts and reads the last event stored in nvram. This is why the count starts to look sane after rtasd starts. We could put code in to initialize the error log count earlier if people really want it, I don't think its really neccessary though. > > Anton > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev > > -- Nathan Fontenot Power Linux Platform Serviceability Home: IBM Austin 908/1E-036 Phone: 512.838.3377 (T/L 678.3377) Email: nfont at austin.ibm.com From johnrose at austin.ibm.com Fri Nov 5 08:29:30 2004 From: johnrose at austin.ibm.com (John Rose) Date: Thu, 04 Nov 2004 15:29:30 -0600 Subject: [PATCH] PPC64 pSeries iommu cleanups Message-ID: <1099603770.30815.4.camel@sinatra.austin.ibm.com> Hi Paul- Here's a resend of the last iommu patch I sent, re-based against current linus bk. This patch changes the following iommu-related things: - Renames the [i,p]series versions of iommu_devnode_init(), to keep things logically separate where possible. - Moves iommu_free_table() to generic iommu.c - Creates of_cleanup_node(), which will directly precede the dynamic removal of any device node Comments welcome. Thanks- John Signed-off-by: John Rose diff -puN arch/ppc64/kernel/iSeries_iommu.c~iommu_free_table_fix4 arch/ppc64/kernel/iSeries_iommu.c --- 2_6_ketchup/arch/ppc64/kernel/iSeries_iommu.c~iommu_free_table_fix4 2004-11-04 15:22:10.000000000 -0600 +++ 2_6_ketchup-johnrose/arch/ppc64/kernel/iSeries_iommu.c 2004-11-04 15:22:10.000000000 -0600 @@ -171,7 +171,7 @@ static void iommu_table_getparms(struct } -void iommu_devnode_init(struct iSeries_Device_Node *dn) { +void iommu_devnode_init_iSeries(struct iSeries_Device_Node *dn) { struct iommu_table *tbl; tbl = (struct iommu_table *)kmalloc(sizeof(struct iommu_table), GFP_KERNEL); diff -puN arch/ppc64/kernel/iSeries_pci.c~iommu_free_table_fix4 arch/ppc64/kernel/iSeries_pci.c --- 2_6_ketchup/arch/ppc64/kernel/iSeries_pci.c~iommu_free_table_fix4 2004-11-04 15:22:10.000000000 -0600 +++ 2_6_ketchup-johnrose/arch/ppc64/kernel/iSeries_pci.c 2004-11-04 15:22:10.000000000 -0600 @@ -329,7 +329,7 @@ void __init iSeries_pci_final_fixup(void iSeries_Device_Information(pdev, Buffer, sizeof(Buffer)); printk("%d. %s\n", DeviceCount, Buffer); - iommu_devnode_init(node); + iommu_devnode_init_iSeries(node); } else printk("PCI: Device Tree not found for 0x%016lX\n", (unsigned long)pdev); diff -puN arch/ppc64/kernel/iommu.c~iommu_free_table_fix4 arch/ppc64/kernel/iommu.c --- 2_6_ketchup/arch/ppc64/kernel/iommu.c~iommu_free_table_fix4 2004-11-04 15:22:10.000000000 -0600 +++ 2_6_ketchup-johnrose/arch/ppc64/kernel/iommu.c 2004-11-04 15:22:10.000000000 -0600 @@ -425,6 +425,39 @@ struct iommu_table *iommu_init_table(str return tbl; } +void iommu_free_table(struct device_node *dn) +{ + struct iommu_table *tbl = dn->iommu_table; + unsigned long bitmap_sz, i; + unsigned int order; + + if (!tbl || !tbl->it_map) { + printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, + dn->full_name); + return; + } + + /* verify that table contains no entries */ + /* it_mapsize is in entries, and we're examining 64 at a time */ + for (i = 0; i < (tbl->it_mapsize/64); i++) { + if (tbl->it_map[i] != 0) { + printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", + __FUNCTION__, dn->full_name); + break; + } + } + + /* calculate bitmap size in bytes */ + bitmap_sz = (tbl->it_mapsize + 7) / 8; + + /* free bitmap */ + order = get_order(bitmap_sz); + free_pages((unsigned long) tbl->it_map, order); + + /* free table */ + kfree(tbl); +} + /* Creates TCEs for a user provided buffer. The user buffer must be * contiguous real kernel storage (not vmalloc). The address of the buffer * passed here is the kernel (virtual) address of the buffer. The buffer diff -puN arch/ppc64/kernel/pSeries_iommu.c~iommu_free_table_fix4 arch/ppc64/kernel/pSeries_iommu.c --- 2_6_ketchup/arch/ppc64/kernel/pSeries_iommu.c~iommu_free_table_fix4 2004-11-04 15:22:10.000000000 -0600 +++ 2_6_ketchup-johnrose/arch/ppc64/kernel/pSeries_iommu.c 2004-11-04 15:22:10.000000000 -0600 @@ -276,7 +276,7 @@ static void iommu_buses_init(void) first_phb = 0; for (dn = first_dn; dn != NULL; dn = dn->sibling) - iommu_devnode_init(dn); + iommu_devnode_init_pSeries(dn); } } @@ -298,7 +298,7 @@ static void iommu_buses_init_lpar(struct * Do it now because iommu_table_setparms_lpar needs it. */ busdn->bussubno = bus->number; - iommu_devnode_init(busdn); + iommu_devnode_init_pSeries(busdn); } /* look for a window on a bridge even if the PHB had one */ @@ -397,7 +397,7 @@ static void iommu_table_setparms_lpar(st } -void iommu_devnode_init(struct device_node *dn) +void iommu_devnode_init_pSeries(struct device_node *dn) { struct iommu_table *tbl; @@ -412,39 +412,6 @@ void iommu_devnode_init(struct device_no dn->iommu_table = iommu_init_table(tbl); } -void iommu_free_table(struct device_node *dn) -{ - struct iommu_table *tbl = dn->iommu_table; - unsigned long bitmap_sz, i; - unsigned int order; - - if (!tbl || !tbl->it_map) { - printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, - dn->full_name); - return; - } - - /* verify that table contains no entries */ - /* it_mapsize is in entries, and we're examining 64 at a time */ - for (i = 0; i < (tbl->it_mapsize/64); i++) { - if (tbl->it_map[i] != 0) { - printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", - __FUNCTION__, dn->full_name); - break; - } - } - - /* calculate bitmap size in bytes */ - bitmap_sz = (tbl->it_mapsize + 7) / 8; - - /* free bitmap */ - order = get_order(bitmap_sz); - free_pages((unsigned long) tbl->it_map, order); - - /* free table */ - kfree(tbl); -} - void iommu_setup_pSeries(void) { struct pci_dev *dev = NULL; @@ -469,7 +436,6 @@ void iommu_setup_pSeries(void) } } - /* These are called very early. */ void tce_init_pSeries(void) { diff -puN arch/ppc64/kernel/prom.c~iommu_free_table_fix4 arch/ppc64/kernel/prom.c --- 2_6_ketchup/arch/ppc64/kernel/prom.c~iommu_free_table_fix4 2004-11-04 15:22:10.000000000 -0600 +++ 2_6_ketchup-johnrose/arch/ppc64/kernel/prom.c 2004-11-04 15:22:10.000000000 -0600 @@ -1740,7 +1740,7 @@ static int of_finish_dynamic_node(struct if (strcmp(node->name, "pci") == 0 && get_property(node, "ibm,dma-window", NULL)) { node->bussubno = node->busno; - iommu_devnode_init(node); + iommu_devnode_init_pSeries(node); } else node->iommu_table = parent->iommu_table; #endif /* CONFIG_PPC_PSERIES */ @@ -1802,6 +1802,15 @@ int of_add_node(const char *path, struct } /* + * Prepare an OF node for removal from system + */ +static void of_cleanup_node(struct device_node *np) +{ + if (np->iommu_table && get_property(np, "ibm,dma-window", NULL)) + iommu_free_table(np); +} + +/* * Remove an OF device node from the system. * Caller should have already "gotten" np. */ @@ -1818,13 +1827,7 @@ int of_remove_node(struct device_node *n return -EBUSY; } - /* XXX This is a layering violation, should be moved to the caller - * --BenH. - */ -#ifdef CONFIG_PPC_PSERIES - if (np->iommu_table) - iommu_free_table(np); -#endif /* CONFIG_PPC_PSERIES */ + of_cleanup_node(np); write_lock(&devtree_lock); OF_MARK_STALE(np); diff -puN include/asm-ppc64/iommu.h~iommu_free_table_fix4 include/asm-ppc64/iommu.h --- 2_6_ketchup/include/asm-ppc64/iommu.h~iommu_free_table_fix4 2004-11-04 15:22:10.000000000 -0600 +++ 2_6_ketchup-johnrose/include/asm-ppc64/iommu.h 2004-11-04 15:22:10.000000000 -0600 @@ -110,22 +110,18 @@ struct scatterlist; extern void iommu_setup_pSeries(void); extern void iommu_setup_u3(void); -/* Creates table for an individual device node */ -/* XXX: This isn't generic, please name it accordingly or add - * some ppc_md. hooks for iommu implementations to do what they - * need to do. --BenH. - */ -extern void iommu_devnode_init(struct device_node *dn); - /* Frees table for an individual device node */ -/* XXX: This isn't generic, please name it accordingly or add - * some ppc_md. hooks for iommu implementations to do what they - * need to do. --BenH. - */ extern void iommu_free_table(struct device_node *dn); #endif /* CONFIG_PPC_MULTIPLATFORM */ +#ifdef CONFIG_PPC_PSERIES + +/* Creates table for an individual device node */ +extern void iommu_devnode_init_pSeries(struct device_node *dn); + +#endif /* CONFIG_PPC_PSERIES */ + #ifdef CONFIG_PPC_ISERIES /* Walks all buses and creates iommu tables */ @@ -136,7 +132,7 @@ extern void __init iommu_vio_init(void); struct iSeries_Device_Node; /* Creates table for an individual device node */ -extern void iommu_devnode_init(struct iSeries_Device_Node *dn); +extern void iommu_devnode_init_iSeries(struct iSeries_Device_Node *dn); #endif /* CONFIG_PPC_ISERIES */ _ From anton at samba.org Fri Nov 5 16:09:33 2004 From: anton at samba.org (Anton Blanchard) Date: Fri, 5 Nov 2004 16:09:33 +1100 Subject: RTAS error log sequence numbers In-Reply-To: <418A61FE.7030500@austin.ibm.com> References: <20041104155032.GB1268@krispykreme.ozlabs.ibm.com> <20041104163759.GR10026@austin.ibm.com> <418A61FE.7030500@austin.ibm.com> Message-ID: <20041105050933.GC8470@krispykreme.ozlabs.ibm.com> > Moving this to earlier in the boot sequence would be nice but I'm not > sure its worth the effort. Is there any way to garauntee that this is > done vefore anyone calls log_error()? Since the userspace tools can handle it, Im OK to ignore the issue. Anton From l_indien at magic.fr Fri Nov 5 23:14:03 2004 From: l_indien at magic.fr (J. Mayer) Date: Fri, 05 Nov 2004 13:14:03 +0100 Subject: Booting Imac G5 Message-ID: <1099656843.8346.7.camel@rapid> Hi, I have a new Imac G5 and I made Linux boot on it. Here's a patch proposal to get the Sungem ethernet device, the firewire and the IDE controler recognized. There still are major issues: - serial ATA freezes during disc probe - the RTC isn't recongnized - of course, there is no power / fan management. My patch is a very minimal one which made me able to boot from CDROM and firewire disk drive, that's a start ;-) Note that this patch was originally done against the gentoo version of linux-2.6.8 but applies well against kernel.org 2.6.9. I'll try to take a look and solve the SATA issue during this week-end. Regards. -- J. Mayer Never organized -------------- next part -------------- A non-text attachment was scrubbed... Name: linux-2.6.8-gentoo.diff Type: text/x-patch Size: 2150 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041105/3350b11d/attachment.bin From olh at suse.de Sat Nov 6 09:03:41 2004 From: olh at suse.de (Olaf Hering) Date: Fri, 5 Nov 2004 23:03:41 +0100 Subject: [PATCH] call ibm,os-term only if its available Message-ID: <20041105220341.GA28064@suse.de> The rtas property 'ibm,os-term' is not available on JS20, a panic will print: unable to mount root filesystem on /dev/hda Kernel panic - not syncing: Attempted to kill init! <0>ibm,os-term call failed -1 Rebooting in 42 seconds.. Signed-off-by: Olaf Hering diff -purN linux-2.6.10-rc1-bk15.orig/arch/ppc64/kernel/rtas.c linux-2.6.10-rc1-bk15.ibm,os-term/arch/ppc64/kernel/rtas.c --- linux-2.6.10-rc1-bk15.orig/arch/ppc64/kernel/rtas.c 2004-11-05 14:52:14.747905961 +0100 +++ linux-2.6.10-rc1-bk15.ibm,os-term/arch/ppc64/kernel/rtas.c 2004-11-05 23:00:10.581515367 +0100 @@ -439,6 +439,9 @@ void rtas_os_term(char *str) { int status; + if (RTAS_UNKNOWN_SERVICE == rtas_token("ibm,os-term")) + return; + snprintf(rtas_os_term_buf, 2048, "OS panic: %s", str); do { -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG From benh at kernel.crashing.org Sat Nov 6 11:56:05 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 06 Nov 2004 11:56:05 +1100 Subject: Booting Imac G5 In-Reply-To: <1099656843.8346.7.camel@rapid> References: <1099656843.8346.7.camel@rapid> Message-ID: <1099702566.3946.49.camel@gaston> On Fri, 2004-11-05 at 13:14 +0100, J. Mayer wrote: > Hi, > > I have a new Imac G5 and I made Linux boot on it. Here's a patch > proposal to get the Sungem ethernet device, the firewire and the IDE > controler recognized. > There still are major issues: > - serial ATA freezes during disc probe > - the RTC isn't recongnized > - of course, there is no power / fan management. > My patch is a very minimal one which made me able to boot from CDROM and > firewire disk drive, that's a start ;-) > Note that this patch was originally done against the gentoo version of > linux-2.6.8 but applies well against kernel.org 2.6.9. > I'll try to take a look and solve the SATA issue during this week-end. Nice ! Did you submit the new PCI IDs to the online database too ? http://pciids.sourceforge.net/ Ben. From l_indien at magic.fr Sat Nov 6 22:28:56 2004 From: l_indien at magic.fr (J. Mayer) Date: Sat, 06 Nov 2004 12:28:56 +0100 Subject: Booting Imac G5 In-Reply-To: <1099702566.3946.49.camel@gaston> References: <1099656843.8346.7.camel@rapid> <1099702566.3946.49.camel@gaston> Message-ID: <1099740535.8346.33.camel@rapid> On Sat, 2004-11-06 at 01:56, Benjamin Herrenschmidt wrote: > On Fri, 2004-11-05 at 13:14 +0100, J. Mayer wrote: > > Hi, > > > > I have a new Imac G5 and I made Linux boot on it. Here's a patch > > proposal to get the Sungem ethernet device, the firewire and the IDE > > controler recognized. > > There still are major issues: > > - serial ATA freezes during disc probe > > - the RTC isn't recongnized > > - of course, there is no power / fan management. > > My patch is a very minimal one which made me able to boot from CDROM and > > firewire disk drive, that's a start ;-) > > Note that this patch was originally done against the gentoo version of > > linux-2.6.8 but applies well against kernel.org 2.6.9. > > I'll try to take a look and solve the SATA issue during this week-end. > > Nice ! Did you submit the new PCI IDs to the online database too ? I just did, now that you remind it to me ;-) I also added the two following devices that I just identified, using /proc/device-tree to locate them: 004f Shasta Mac I/O 0058 U3 AGP bridge Regards. -- J. Mayer Never organized From l_indien at magic.fr Sun Nov 7 06:25:23 2004 From: l_indien at magic.fr (J. Mayer) Date: Sat, 06 Nov 2004 20:25:23 +0100 Subject: Booting Imac G5 In-Reply-To: <1099702566.3946.49.camel@gaston> References: <1099656843.8346.7.camel@rapid> <1099702566.3946.49.camel@gaston> Message-ID: <1099769123.8346.41.camel@rapid> On Sat, 2004-11-06 at 01:56, Benjamin Herrenschmidt wrote: > On Fri, 2004-11-05 at 13:14 +0100, J. Mayer wrote: > > Hi, > > > > I have a new Imac G5 and I made Linux boot on it. Here's a patch > > proposal to get the Sungem ethernet device, the firewire and the IDE > > controler recognized. > > There still are major issues: > > - serial ATA freezes during disc probe > > - the RTC isn't recongnized > > - of course, there is no power / fan management. > > My patch is a very minimal one which made me able to boot from CDROM and > > firewire disk drive, that's a start ;-) > > Note that this patch was originally done against the gentoo version of > > linux-2.6.8 but applies well against kernel.org 2.6.9. > > I'll try to take a look and solve the SATA issue during this week-end. > Hi again, as I can see you wrote the SATA driver for Pmac, you may have an idea of what going wrong on the Imac. I did activate DPRINTK and VPRINTK in libata and added a few messages. It seems that the SET_FEATURES command never completes. So the insmod stays blocked but the machine is still fully usable from another shell. I attach here the complete dmesg I got when booting. Please note that the message: "ata_dev_set_xfermode: qc_issue xfer_mode=12" used to be "... xfer=70" (note the printk I added is decimal) but I tried to force it to the xfer_mode I saw from ata_host_set_pio trace, and it changed nothing. Regards. -- J. Mayer Never organized -------------- next part -------------- Found initrd at 0xc000000001b00000:0xc000000001b76aad trying to initialize btext ... Starting Linux PPC64 2.6.9 ----------------------------------------------------- naca = 0xc000000000004000 naca->pftSize = 0x17 naca->debug_switch = 0x0 naca->interrupt_controller = 0x1 systemcfg = 0xc000000000005000 systemcfg->processorCount = 0x0 systemcfg->physicalMemorySize = 0x20000000 systemcfg->dCacheL1LineSize = 0x80 systemcfg->iCacheL1LineSize = 0x80 htab_data.htab = 0xc00000001f800000 htab_data.num_ptegs = 0x10000 ----------------------------------------------------- [boot]0100 MM Init [boot]0100 MM Init Done Linux version 2.6.9 (root at imac) (gcc version 3.4.1 20040803 (Gentoo Linux 3.4.1-r3, ssp-3.4-2, pie-8.7.6.5)) #6 Sat Nov 6 19:21:12 CET 2004 [boot]0012 Setup Arch Using native/NAP idle loop Found U3 memory controller & host bridge, revision: 57 Mapped at 0xe000000080152000 Found a K2 mac-io controller, rev: 0, mapped at 0xe000000080193000 PowerMac motherboard: IMac G5 nvram: Checking bank 0... nvram: gen0=118, gen1=117 nvram: Active bank is: 0 Adding PCI host bridge /pci at 0,f0000000 Found U3-AGP PCI host bridge. Firmware bus number: 240->255 Adding PCI host bridge /ht at 0,f2000000 Can't get bus-range for /ht at 0,f2000000, assume bus 0 U3/HT: hole, 0 end at 8fffffff, 1 start at b0000000 Found U3-HT PCI host bridge. Firmware bus number: 0->239 Can't get bus-range for /ht at 0,f2000000 PCI Host 0, io start: fffffffffd800000; io end: fffffffffdffffff PCI Host 1, io start: 0; io end: 3fffff Top of RAM: 0x20000000, Total RAM: 0x20000000 Memory hole size: 0MB On node 0 totalpages: 131072 DMA zone: 131072 pages, LIFO batch:16 Normal zone: 0 pages, LIFO batch:1 HighMem zone: 0 pages, LIFO batch:1 [boot]0015 Setup Done Built 1 zonelists Kernel command line: root=/dev/ram rw ramdisk_size=11000 init=/linuxrc devfs real_root=/dev/scsi/host0/bus0/target0/lun0/part14 devf real_root=/dev/scsi/host0/bus0/target0/lun0/part14 PowerMac using OpenPIC irq controller at 0x80040000 [boot]0020 OpenPic Init OpenPIC Version 1.2 (4 CPUs and 124 IRQ sources) at e000000082e1c000 [boot]0025 OpenPic Done Slave OpenPIC at 0xf8040000 hooked on IRQ 96 [boot]0020 OpenPic U3 Init OpenPIC (U3) Version 1.2 [boot]0025 OpenPic U3 Done PID hash table entries: 4096 (order: 12, 131072 bytes) time_init: decrementer frequency = 33.333333 MHz Console: colour dummy device 80x25 Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes) Inode-cache hash table entries: 65536 (order: 7, 524288 bytes) Memory: 500480k/524288k available (3244k kernel code, 23472k reserved, 1484k data, 322k bss, 164k init) Calibrating delay loop... 66.56 BogoMIPS (lpj=33280) Mount-cache hash table entries: 256 (order: 0, 4096 bytes) checking if image is initramfs...it isn't (no cpio magic); looks like an initrd Freeing initrd memory: 474k freed NET: Registered protocol family 16 PCI: Probing PCI hardware U3-DART: table not allocated, using direct DMA PCI: Probing PCI hardware done SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub nvram_init: Could not find nvram partition for nvram buffered error logging. devfs: 2004-01-31 Richard Gooch (rgooch at atnf.csiro.au) devfs: boot_options: 0x1 Initializing Cryptographic API Using unsupported 1440x900 NVDA,Display-A at a0008000, depth=8, pitch=1536 Console: switching to colour frame buffer device 180x56 fb0: Open Firmware frame buffer device on /pci at 0,f0000000/NVDA,Parent at 10/NVDA,Display-A at 0 RAMDISK driver initialized: 16 RAM disks of 11000K size 1024 blocksize loop: loaded (max 8 devices) sungem.c:v0.98 8/24/03 David S. Miller (davem at redhat.com) eth0: Sun GEM (PCI) 10/100/1000BaseT Ethernet 00:0d:93:57:f6:f6 PHY ID: 4061e4, addr: 0 eth0: Found BCM5221 PHY MacIO PCI driver attached to K2 chipset Warning: no ADB interface detected Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx PCI: Enabling device: (0001:02:0d.0), cmd 2 ide0: Found Apple OHare ATA controller, bus ID 3, irq 38 Probing IDE interface ide0... hda: MATSHITADVD-R UJ-825, ATAPI CD/DVD-ROM drive hda: MDMA, cycleTime: 150, accessTime: 75, recTime: 75 hda: Set MDMA timing for mode 2, reg: 0x00221526 hda: Enabling MultiWord DMA 2 Using anticipatory io scheduler ide0 at 0xe0000000831f0000-0xe0000000831f0007,0xe0000000831f0160 on irq 38 hda: ATAPI 24X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, (U)DMA Uniform CD-ROM driver Revision: 3.20 ieee1394: Initialized config rom entry `ip1394' ohci1394: $Rev: 1223 $ Ben Collins PCI: Enabling device: (0001:02:0e.0), cmd 2 ohci1394: fw-host0: Unexpected PCI resource length of 1000! ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[39] MMIO=[80100000-801007ff] Max Packet=[2048] sbp2: $Rev: 1219 $ Ben Collins ohci_hcd: 2004 Feb 02 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) PCI: Enabling device: (0001:01:0b.0), cmd 2 ohci_hcd 0001:01:0b.0: NEC Corporation USB ohci_hcd 0001:01:0b.0: irq 70, pci mem e0000000831f3000 ohci_hcd 0001:01:0b.0: new USB bus registered, assigned bus number 1 hub 1-0:1.0: USB hub found hub 1-0:1.0: 3 ports detected PCI: Enabling device: (0001:01:0b.1), cmd 2 ohci_hcd 0001:01:0b.1: NEC Corporation USB (#2) ohci_hcd 0001:01:0b.1: irq 70, pci mem e0000000831f4000 ohci_hcd 0001:01:0b.1: new USB bus registered, assigned bus number 2 hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.0:USB HID core driver mice: PS/2 mouse device common for all mice i2c /dev entries driver Found KeyWest i2c on "u3", 2 channels, stepping: 4 bits Found KeyWest i2c on "mac-io", 1 channel, stepping: 4 bits NET: Registered protocol family 26 NET: Registered protocol family 2 IP: routing cache hash table of 4096 buckets, 32Kbytes TCP: Hash tables configured (established 131072 bind 65536) NET: Registered protocol family 1 NET: Registered protocol family 17 RAMDISK: Compressed image found at block 0 EXT2-fs warning: checktime reached, running e2fsck is recommended VFS: Mounted root (ext2 filesystem). Mounted devfs on /dev Freeing unused kernel memory: 164k freed usb 1-1: new full speed USB device using address 2 hub 1-1:1.0: USB hub found hub 1-1:1.0: 3 ports detected usb 1-2: new low speed USB device using address 3 input: USB HID v1.10 Mouse [Logitech Trackball] on usb-0001:01:0b.0-2 usb 1-3: new full speed USB device using address 4 ieee1394: Node added: ID:BUS[0-00:1023] GUID[0030e000e0000e1c] ieee1394: Host added: ID:BUS[0-01:1023] GUID[000d93fffe57f6f6] scsi0 : SCSI emulation for IEEE-1394 SBP-2 Devices input: USB HID v1.11 Keyboard [05ac:1000] on usb-0001:01:0b.0-3 input: USB HID v1.11 Mouse [05ac:1000] on usb-0001:01:0b.0-3 usb 2-1: new low speed USB device using address 2 input: USB HID v1.10 Keyboard [CHICONY USB Keyboard] on usb-0001:01:0b.1-1 input,hiddev0: USB HID v1.10 Device [CHICONY USB Keyboard] on usb-0001:01:0b.1-1 usb 1-1.3: new full speed USB device using address 5 input: USB HID v1.10 Keyboard [Mitsumi Electric Apple Extended USB Keyboard] on usb-0001:01:0b.0-1.3 input: USB HID v1.10 Device [Mitsumi Electric Apple Extended USB Keyboard] on usb-0001:01:0b.0-1.3 ieee1394: sbp2: Logged into SBP-2 device ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048] Vendor: IBM-DTLA Model: -307030 Rev: Type: Direct-Access ANSI SCSI revision: 06 SCSI device sda: 60036480 512-byte hdwr sectors (30739 MB) sda: asking for cache data failed sda: assuming drive cache: write through /dev/scsi/host0/bus0/target0/lun0: [mac] p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 EXT2-fs warning: checktime reached, running e2fsck is recommended ieee1394: unsolicited response packet received - no tlabel match EXT2-fs warning: checktime reached, running e2fsck is recommended EXT2-fs warning: checktime reached, running e2fsck is recommended PCI: Enabling device: (0001:01:0b.2), cmd 6 ehci_hcd 0001:01:0b.2: NEC Corporation USB 2.0 ehci_hcd 0001:01:0b.2: irq 70, pci mem e0000000831f8000 ehci_hcd 0001:01:0b.2: new USB bus registered, assigned bus number 3 ehci_hcd 0001:01:0b.2: USB 2.0 enabled, EHCI 1.00, driver 2004-May-10 usb 2-1: USB disconnect, address 2 hub 3-0:1.0: USB hub found hub 3-0:1.0: 5 ports detected drivers/usb/input/hid-core.c: can't resubmit intr, 0001:01:0b.1-1/input1, status -19 usb 1-1: USB disconnect, address 2 usb 1-1.3: USB disconnect, address 5 usb 1-2: USB disconnect, address 3 usb 1-3: USB disconnect, address 4 usb 1-1: new full speed USB device using address 6 hub 1-1:1.0: USB hub found hub 1-1:1.0: 3 ports detected usb 1-2: new low speed USB device using address 7 input: USB HID v1.10 Mouse [Logitech Trackball] on usb-0001:01:0b.0-2 usb 1-3: new full speed USB device using address 8 input: USB HID v1.11 Keyboard [05ac:1000] on usb-0001:01:0b.0-3 input: USB HID v1.11 Mouse [05ac:1000] on usb-0001:01:0b.0-3 usb 2-1: new low speed USB device using address 3 input: USB HID v1.10 Keyboard [CHICONY USB Keyboard] on usb-0001:01:0b.1-1 input,hiddev0: USB HID v1.10 Device [CHICONY USB Keyboard] on usb-0001:01:0b.1-1 usb 1-1.3: new full speed USB device using address 9 input: USB HID v1.10 Keyboard [Mitsumi Electric Apple Extended USB Keyboard] on usb-0001:01:0b.0-1.3 input: USB HID v1.10 Device [Mitsumi Electric Apple Extended USB Keyboard] on usb-0001:01:0b.0-1.3 PHY ID: 4061e4, addr: 0 NET: Registered protocol family 10 Disabled Privacy Extensions on device c00000000048a7a8(lo) IPv6 over IPv4 tunneling driver eth0: Link is up at 100 Mbps, full-duplex. eth0: Pause is disabled hda: MDMA, cycleTime: 150, accessTime: 75, recTime: 75 hda: Set MDMA timing for mode 2, reg: 0x00221526 hda: Enabling MultiWord DMA 2 libata version 1.02 loaded. sata_svw version 1.04 ata_device_add: ENTER ata_host_add: ENTER ata_port_start: prd alloc, virt c000000012a7f000, dma 12a7f000 ata1: SATA max UDMA/133 cmd 0xE0000000831F9000 ctl 0xE0000000831F9020 bmdma 0xE0000000831F9030 irq 0 ata_host_add: ENTER ata_port_start: prd alloc, virt c000000012a73000, dma 12a73000 ata2: SATA max UDMA/133 cmd 0xE0000000831F9100 ctl 0xE0000000831F9120 bmdma 0xE0000000831F9130 irq 0 ata_host_add: ENTER ata_port_start: prd alloc, virt c00000001291b000, dma 1291b000 ata3: SATA max UDMA/133 cmd 0xE0000000831F9200 ctl 0xE0000000831F9220 bmdma 0xE0000000831F9230 irq 0 ata_host_add: ENTER ata_port_start: prd alloc, virt c000000011430000, dma 11430000 ata4: SATA max UDMA/133 cmd 0xE0000000831F9300 ctl 0xE0000000831F9320 bmdma 0xE0000000831F9330 irq 0 ata_device_add: probe begin ata_device_add: ata1: probe begin ata_bus_reset: ENTER, host 1, port 0 ata_dev_classify: found ATA device by sig ata_bus_reset: EXIT ata_dev_identify: ENTER, host 1, dev 0 ata_dev_select: ENTER, ata1: device 0, wait 1 ata_dev_identify: do ATA identify ata_sg_setup_one: mapped buffer of 512 bytes for read ata_fill_sg: PRD[0] = (0x12A7B3C0, 0x200) ata_dev_select: ENTER, ata1: device 0, wait 1 ata_exec_command_mmio: ata1: cmd 0xEC ata_pio_sector: data read ata_sg_clean: unmapping 1 sg elements ata_qc_complete: EXIT ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003 88:007f ata_dump_id: 49==0x2f00 53==0x0007 63==0x0407 64==0x0003 75==0x0000 ata_dump_id: 80==0x007e 81==0x001b 82==0x346b 83==0x7d01 84==0x4003 ata_dump_id: 88==0x007f 93==0x0000 ata1: dev 0 ATA, max UDMA/133, 156301488 sectors: lba48 ata_dev_identify: EXIT, drv_stat = 0x50 ata_dev_identify: ENTER/EXIT (host 1, dev 1) -- nodev ata_host_set_pio: base 0x8 xfer_mode 0xc mask 0x1f x 4 ata_dev_set_xfermode: set features - xfer mode ata_dev_set_xfermode: qc_issue xfer_mode=12 ata_dev_select: ENTER, ata1: device 0, wait 1 ata_exec_command_mmio: ata1: cmd 0xEF ata_dev_set_xfermode: wait for completion From benh at kernel.crashing.org Sun Nov 7 07:50:42 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 07 Nov 2004 07:50:42 +1100 Subject: Booting Imac G5 In-Reply-To: <1099769123.8346.41.camel@rapid> References: <1099656843.8346.7.camel@rapid> <1099702566.3946.49.camel@gaston> <1099769123.8346.41.camel@rapid> Message-ID: <1099774242.10262.99.camel@gaston> > as I can see you wrote the SATA driver for Pmac, you may have an idea of > what going wrong on the Imac. > I did activate DPRINTK and VPRINTK in libata and added a few messages. > It seems that the SET_FEATURES command never completes. So the insmod > stays blocked but the machine is still fully usable from another shell. > I attach here the complete dmesg I got when booting. > Please note that the message: > "ata_dev_set_xfermode: qc_issue xfer_mode=12" used to be "... xfer=70" > (note the printk I added is decimal) but I tried to force it to the > xfer_mode I saw from ata_host_set_pio trace, and it changed nothing. Difficult to say at this point... you can try not resetting the PHY for now (remove the ATA_FLAG_SATA_RESET) from host_flags. Did you look at Darwin code for anything that may have changed ? Also, I think you can try lowering the max DMA speed: probe_ent->udma_mask = 0x7f; to probe_ent->udma_mask = 0x3f; Ben. From benh at kernel.crashing.org Sun Nov 7 10:51:21 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 07 Nov 2004 10:51:21 +1100 Subject: Booting Imac G5 In-Reply-To: <1099774242.10262.99.camel@gaston> References: <1099656843.8346.7.camel@rapid> <1099702566.3946.49.camel@gaston> <1099769123.8346.41.camel@rapid> <1099774242.10262.99.camel@gaston> Message-ID: <1099785081.5295.114.camel@gaston> Ok, a new Darwin is out and the driver there has some additional bits, related to the SATA cell. I'm hacking together a patch. Ben. From benh at kernel.crashing.org Sun Nov 7 11:32:48 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 07 Nov 2004 11:32:48 +1100 Subject: Booting Imac G5 In-Reply-To: <1099785081.5295.114.camel@gaston> References: <1099656843.8346.7.camel@rapid> <1099702566.3946.49.camel@gaston> <1099769123.8346.41.camel@rapid> <1099774242.10262.99.camel@gaston> <1099785081.5295.114.camel@gaston> Message-ID: <1099787569.3946.116.camel@gaston> On Sun, 2004-11-07 at 10:51 +1100, Benjamin Herrenschmidt wrote: > Ok, a new Darwin is out and the driver there has some additional bits, > related to the SATA cell. I'm hacking together a patch. Index: linux-work/drivers/scsi/sata_svw.c =================================================================== --- linux-work.orig/drivers/scsi/sata_svw.c 2004-10-13 09:02:05.000000000 +1000 +++ linux-work/drivers/scsi/sata_svw.c 2004-11-07 11:23:41.945588808 +1100 @@ -49,7 +49,7 @@ #endif /* CONFIG_PPC_OF */ #define DRV_NAME "sata_svw" -#define DRV_VERSION "1.04" +#define DRV_VERSION "1.05" /* Taskfile registers offsets */ #define K2_SATA_TF_CMD_OFFSET 0x00 @@ -75,10 +75,19 @@ #define K2_SATA_SICR1_OFFSET 0x80 #define K2_SATA_SICR2_OFFSET 0x84 #define K2_SATA_SIM_OFFSET 0x88 +#define K2_SATA_MDIO_ACCESS 0x8c /* Port stride */ #define K2_SATA_PORT_OFFSET 0x100 +/* Private structure */ +struct k2_sata_priv +{ +#ifdef CONFIG_PPC_OF + struct device_node *of_node; +#endif + int need_mdio_phy_reset; +}; static u32 k2_sata_scr_read (struct ata_port *ap, unsigned int sc_reg) { @@ -96,6 +105,42 @@ writel(val, (void *) ap->ioaddr.scr_addr + (sc_reg * 4)); } +static u16 k2_sata_mdio_read(struct ata_host_set *host, int reg) +{ + u16 val; + int timeout; + + writel(host_set->mmio_base + K2_SATA_MDIO_ACCESS, + (reg & 0x1f) | 0x4000); + for(timeout = 10000; timeout > 0; timeout++) { + val = readl(host_set->mmio_base + K2_SATA_MDIO_ACCESS); + if (val & 0x8000) + break; + udelay(100); + } + if (timeout <= 0) { + printk(KERN_WARNING "sata_svw: timeout reading MDIO reg %d\n", reg); + return 0xffff; + } + return val >> 16; +} + +static void k2_sata_mdio_write(struct ata_host_set *host, int reg, u16 val) +{ + u16 val; + int timeout; + + writel(host_set->mmio_base + K2_SATA_MDIO_ACCESS, + (reg & 0x1f) | (((u32)val) << 16) | 0x2000); + for(timeout = 10000; timeout > 0; timeout++) { + val = readl(host_set->mmio_base + K2_SATA_MDIO_ACCESS); + if (val & 0x8000) + break; + udelay(100); + } + if (timeout <= 0) + printk(KERN_WARNING "sata_svw: timeout writing MDIO reg %d\n", reg); +} static void k2_sata_tf_load(struct ata_port *ap, struct ata_taskfile *tf) { @@ -220,6 +265,31 @@ return readl((void *) ap->ioaddr.status_addr); } +static void k2_sata_mdio_phy_reset(struct ata_host_set *host_set); +{ + u16 reg; + + reg = k2_sata_mdio_read(host_set, 4); + k2_sata_mdio_write(host_set, 4, reg | 0x0008); + udelay(200); + k2_sata_mdio_write(host_set, 4, reg); + udelay(250); +} + +static void k2_sata_host_start(struct ata_host_set *host_set) +{ + struct k2_sata_priv *pp; + + pp = host_set->private_data; + + /* Some cell revs need a HW reset of the PHY layer at this point, and + * on wakeup from power management + */ + if (pp->need_mdio_phy_reset) + k2_sata_mdio_phy_reset(host_set); +} + + #ifdef CONFIG_PPC_OF /* * k2_sata_proc_info @@ -237,15 +307,15 @@ { struct ata_port *ap; struct device_node *np; + struct k2_sata_priv *pp; int len, index; /* Find the ata_port */ ap = (struct ata_port *) &shost->hostdata[0]; if (ap == NULL) return 0; - - /* Find the OF node for the PCI device proper */ - np = pci_device_to_OF_node(ap->host_set->pdev); + pp = ap->host_set->private_data; + np = pp->of_node; if (np == NULL) return 0; @@ -310,6 +380,7 @@ .scr_write = k2_sata_scr_write, .port_start = ata_port_start, .port_stop = ata_port_stop, + .host_start = k2_sata_host_start, }; static void k2_sata_setup_port(struct ata_ioports *port, unsigned long base) @@ -338,6 +409,7 @@ struct ata_probe_ent *probe_ent = NULL; unsigned long base; void *mmio_base; + struct k2_sata_priv *pp = NULL; int rc; if (!printed_version++) @@ -374,10 +446,31 @@ rc = -ENOMEM; goto err_out_regions; } - memset(probe_ent, 0, sizeof(*probe_ent)); + + pp = (struct k2_sata_priv *)kmalloc(sizeof(struct k2_sata_priv), GFP_KERNEL); + if (pp == NULL) { + rc = -ENOMEM; + goto err_out_free_ent; + } + memset(pp, 0, sizeof(struct k2_sata_priv)); + probe_ent->pdev = pdev; INIT_LIST_HEAD(&probe_ent->node); + probe_ent->private_data = pdev; + +#ifdef CONFIG_PPC_OF + /* Find the OF node for the PCI device proper */ + pp->of_node = pci_device_to_OF_node(ap->host_set->pdev); + + /* Check for revision 1 */ + if (pp->of_node) { + u32 *rev; + rev = (u32 *)get_property(pp->of_node, "cell-revision", NULL); + if (rev && (*rev) > 0) + pp->need_mdio_phy_reset = 1; + } +#endif /* CONFIG_PPC_OF */ mmio_base = ioremap(pci_resource_start(pdev, 5), pci_resource_len(pdev, 5)); @@ -429,7 +522,10 @@ return 0; err_out_free_ent: - kfree(probe_ent); + if (pp) + kfree(pp); + if (probe_ent) + kfree(probe_ent); err_out_regions: pci_release_regions(pdev); err_out: Index: linux-work/drivers/scsi/libata-core.c =================================================================== --- linux-work.orig/drivers/scsi/libata-core.c 2004-11-07 11:24:09.617382056 +1100 +++ linux-work/drivers/scsi/libata-core.c 2004-11-07 11:24:40.491688448 +1100 @@ -3271,6 +3271,9 @@ host_set->private_data = ent->private_data; host_set->ops = ent->port_ops; + if (host_set->ops->host_start) + host_set->ops->host_start(host_set); + /* register each port bound to this device */ for (i = 0; i < ent->n_ports; i++) { struct ata_port *ap; Index: linux-work/include/linux/libata.h =================================================================== --- linux-work.orig/include/linux/libata.h 2004-11-07 11:23:56.598361248 +1100 +++ linux-work/include/linux/libata.h 2004-11-07 11:24:54.242597992 +1100 @@ -349,6 +349,7 @@ int (*port_start) (struct ata_port *ap); void (*port_stop) (struct ata_port *ap); + void (*host_start) (struct ata_host_set *host_set); void (*host_stop) (struct ata_host_set *host_set); }; From benh at kernel.crashing.org Sun Nov 7 11:33:49 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 07 Nov 2004 11:33:49 +1100 Subject: Booting Imac G5 (Wrong patch !) In-Reply-To: <1099787569.3946.116.camel@gaston> References: <1099656843.8346.7.camel@rapid> <1099702566.3946.49.camel@gaston> <1099769123.8346.41.camel@rapid> <1099774242.10262.99.camel@gaston> <1099785081.5295.114.camel@gaston> <1099787569.3946.116.camel@gaston> Message-ID: <1099787630.3884.118.camel@gaston> Oops, sent the wrong patch, here it is: Index: linux-work/drivers/scsi/sata_svw.c =================================================================== --- linux-work.orig/drivers/scsi/sata_svw.c 2004-10-13 09:02:05.000000000 +1000 +++ linux-work/drivers/scsi/sata_svw.c 2004-11-07 11:31:52.229054392 +1100 @@ -49,7 +49,7 @@ #endif /* CONFIG_PPC_OF */ #define DRV_NAME "sata_svw" -#define DRV_VERSION "1.04" +#define DRV_VERSION "1.05" /* Taskfile registers offsets */ #define K2_SATA_TF_CMD_OFFSET 0x00 @@ -75,10 +75,19 @@ #define K2_SATA_SICR1_OFFSET 0x80 #define K2_SATA_SICR2_OFFSET 0x84 #define K2_SATA_SIM_OFFSET 0x88 +#define K2_SATA_MDIO_ACCESS 0x8c /* Port stride */ #define K2_SATA_PORT_OFFSET 0x100 +/* Private structure */ +struct k2_sata_priv +{ +#ifdef CONFIG_PPC_OF + struct device_node *of_node; +#endif + int need_mdio_phy_reset; +}; static u32 k2_sata_scr_read (struct ata_port *ap, unsigned int sc_reg) { @@ -96,6 +105,41 @@ writel(val, (void *) ap->ioaddr.scr_addr + (sc_reg * 4)); } +static u16 k2_sata_mdio_read(struct ata_host_set *host_set, int reg) +{ + u16 val; + int timeout; + + writel((reg & 0x1f) | 0x4000, + host_set->mmio_base + K2_SATA_MDIO_ACCESS); + for(timeout = 10000; timeout > 0; timeout++) { + val = readl(host_set->mmio_base + K2_SATA_MDIO_ACCESS); + if (val & 0x8000) + break; + udelay(100); + } + if (timeout <= 0) { + printk(KERN_WARNING "sata_svw: timeout reading MDIO reg %d\n", reg); + return 0xffff; + } + return val >> 16; +} + +static void k2_sata_mdio_write(struct ata_host_set *host_set, int reg, u16 val) +{ + int timeout; + + writel((reg & 0x1f) | (((u32)val) << 16) | 0x2000, + host_set->mmio_base + K2_SATA_MDIO_ACCESS); + for(timeout = 10000; timeout > 0; timeout++) { + val = readl(host_set->mmio_base + K2_SATA_MDIO_ACCESS); + if (val & 0x8000) + break; + udelay(100); + } + if (timeout <= 0) + printk(KERN_WARNING "sata_svw: timeout writing MDIO reg %d\n", reg); +} static void k2_sata_tf_load(struct ata_port *ap, struct ata_taskfile *tf) { @@ -220,6 +264,31 @@ return readl((void *) ap->ioaddr.status_addr); } +static void k2_sata_mdio_phy_reset(struct ata_host_set *host_set) +{ + u16 reg; + + reg = k2_sata_mdio_read(host_set, 4); + k2_sata_mdio_write(host_set, 4, reg | 0x0008); + udelay(200); + k2_sata_mdio_write(host_set, 4, reg); + udelay(250); +} + +static void k2_sata_host_start(struct ata_host_set *host_set) +{ + struct k2_sata_priv *pp; + + pp = host_set->private_data; + + /* Some cell revs need a HW reset of the PHY layer at this point, and + * on wakeup from power management + */ + if (pp->need_mdio_phy_reset) + k2_sata_mdio_phy_reset(host_set); +} + + #ifdef CONFIG_PPC_OF /* * k2_sata_proc_info @@ -237,15 +306,15 @@ { struct ata_port *ap; struct device_node *np; + struct k2_sata_priv *pp; int len, index; /* Find the ata_port */ ap = (struct ata_port *) &shost->hostdata[0]; if (ap == NULL) return 0; - - /* Find the OF node for the PCI device proper */ - np = pci_device_to_OF_node(ap->host_set->pdev); + pp = ap->host_set->private_data; + np = pp->of_node; if (np == NULL) return 0; @@ -310,6 +379,7 @@ .scr_write = k2_sata_scr_write, .port_start = ata_port_start, .port_stop = ata_port_stop, + .host_start = k2_sata_host_start, }; static void k2_sata_setup_port(struct ata_ioports *port, unsigned long base) @@ -338,6 +408,7 @@ struct ata_probe_ent *probe_ent = NULL; unsigned long base; void *mmio_base; + struct k2_sata_priv *pp = NULL; int rc; if (!printed_version++) @@ -374,10 +445,31 @@ rc = -ENOMEM; goto err_out_regions; } - memset(probe_ent, 0, sizeof(*probe_ent)); + + pp = (struct k2_sata_priv *)kmalloc(sizeof(struct k2_sata_priv), GFP_KERNEL); + if (pp == NULL) { + rc = -ENOMEM; + goto err_out_free_ent; + } + memset(pp, 0, sizeof(struct k2_sata_priv)); + probe_ent->pdev = pdev; INIT_LIST_HEAD(&probe_ent->node); + probe_ent->private_data = pdev; + +#ifdef CONFIG_PPC_OF + /* Find the OF node for the PCI device proper */ + pp->of_node = pci_device_to_OF_node(pdev); + + /* Check for revision 1 */ + if (pp->of_node) { + u32 *rev; + rev = (u32 *)get_property(pp->of_node, "cell-revision", NULL); + if (rev && (*rev) > 0) + pp->need_mdio_phy_reset = 1; + } +#endif /* CONFIG_PPC_OF */ mmio_base = ioremap(pci_resource_start(pdev, 5), pci_resource_len(pdev, 5)); @@ -429,7 +521,10 @@ return 0; err_out_free_ent: - kfree(probe_ent); + if (pp) + kfree(pp); + if (probe_ent) + kfree(probe_ent); err_out_regions: pci_release_regions(pdev); err_out: Index: linux-work/drivers/scsi/libata-core.c =================================================================== --- linux-work.orig/drivers/scsi/libata-core.c 2004-11-07 11:24:09.617382056 +1100 +++ linux-work/drivers/scsi/libata-core.c 2004-11-07 11:24:40.491688448 +1100 @@ -3271,6 +3271,9 @@ host_set->private_data = ent->private_data; host_set->ops = ent->port_ops; + if (host_set->ops->host_start) + host_set->ops->host_start(host_set); + /* register each port bound to this device */ for (i = 0; i < ent->n_ports; i++) { struct ata_port *ap; Index: linux-work/include/linux/libata.h =================================================================== --- linux-work.orig/include/linux/libata.h 2004-11-07 11:23:56.598361248 +1100 +++ linux-work/include/linux/libata.h 2004-11-07 11:24:54.242597992 +1100 @@ -349,6 +349,7 @@ int (*port_start) (struct ata_port *ap); void (*port_stop) (struct ata_port *ap); + void (*host_start) (struct ata_host_set *host_set); void (*host_stop) (struct ata_host_set *host_set); }; From anton at samba.org Mon Nov 8 04:20:30 2004 From: anton at samba.org (Anton Blanchard) Date: Mon, 8 Nov 2004 04:20:30 +1100 Subject: [RFC] Consolidate lots of hugepage code In-Reply-To: <20041029034817.GY12934@holomorphy.com> References: <20041029033708.GF12247@zax> <20041029034817.GY12934@holomorphy.com> Message-ID: <20041107172030.GA16976@krispykreme.ozlabs.ibm.com> Hi, > Further consolidation is premature given that outstanding hugetlb bugs > have the implication that architectures' needs are not being served by > the current arch/core split. I have at least two relatively major hugetlb > bugs outstanding, the lack of a flush_dcache_page() analogue first, and > another (soon to be a reported to affected distros) less well-understood. > Unless they're directly toward the end of restoring hugetlb to a sound > state, they're counterproductive to merge before patches doing so. Could you point me at a summary of these 2 issues? Anton From wli at holomorphy.com Mon Nov 8 06:20:24 2004 From: wli at holomorphy.com (William Lee Irwin III) Date: Sun, 7 Nov 2004 11:20:24 -0800 Subject: [RFC] Consolidate lots of hugepage code In-Reply-To: <20041107172030.GA16976@krispykreme.ozlabs.ibm.com> References: <20041029033708.GF12247@zax> <20041029034817.GY12934@holomorphy.com> <20041107172030.GA16976@krispykreme.ozlabs.ibm.com> Message-ID: <20041107192024.GM2890@holomorphy.com> At some point in the past, I wrote: >> Further consolidation is premature given that outstanding hugetlb bugs >> have the implication that architectures' needs are not being served by >> the current arch/core split. I have at least two relatively major hugetlb >> bugs outstanding, the lack of a flush_dcache_page() analogue first, and >> another (soon to be a reported to affected distros) less well-understood. >> Unless they're directly toward the end of restoring hugetlb to a sound >> state, they're counterproductive to merge before patches doing so. On Mon, Nov 08, 2004 at 04:20:30AM +1100, Anton Blanchard wrote: > Could you point me at a summary of these 2 issues? It's all pretty obvious. The first is checking page size vs. cache size and whether it's VI or does anything unusual; thus far things look hopeful that flush_dcache_page() analogues are unnecessary. More information about Super-H is needed to wrap up what will probably be no more than an audit. The second is a triplefault on x86-64 under some condition involving a long-running database regression test. There has obviously been considerably less progress there in no small part due to the amount of time required to reproduce the issue. -- wli From anton at samba.org Mon Nov 8 06:30:07 2004 From: anton at samba.org (Anton Blanchard) Date: Mon, 8 Nov 2004 06:30:07 +1100 Subject: [RFC] Consolidate lots of hugepage code In-Reply-To: <20041107192024.GM2890@holomorphy.com> References: <20041029033708.GF12247@zax> <20041029034817.GY12934@holomorphy.com> <20041107172030.GA16976@krispykreme.ozlabs.ibm.com> <20041107192024.GM2890@holomorphy.com> Message-ID: <20041107193007.GC16976@krispykreme.ozlabs.ibm.com> Hi, > It's all pretty obvious. The first is checking page size vs. cache size > and whether it's VI or does anything unusual; thus far things look > hopeful that flush_dcache_page() analogues are unnecessary. More > information about Super-H is needed to wrap up what will probably be no > more than an audit. Good to hear. > The second is a triplefault on x86-64 under some > condition involving a long-running database regression test. There has > obviously been considerably less progress there in no small part due to > the amount of time required to reproduce the issue. OK. We have not seen a similar issue on ppc64 even with extensive testing (although with HPC apps). The question is how long we should hold off on further hugetlb development waiting for this one bug report on a single architecture to be chased. Anton From wli at holomorphy.com Mon Nov 8 08:09:43 2004 From: wli at holomorphy.com (William Lee Irwin III) Date: Sun, 7 Nov 2004 13:09:43 -0800 Subject: [RFC] Consolidate lots of hugepage code In-Reply-To: <20041107193007.GC16976@krispykreme.ozlabs.ibm.com> References: <20041029033708.GF12247@zax> <20041029034817.GY12934@holomorphy.com> <20041107172030.GA16976@krispykreme.ozlabs.ibm.com> <20041107192024.GM2890@holomorphy.com> <20041107193007.GC16976@krispykreme.ozlabs.ibm.com> Message-ID: <20041107210943.GN2890@holomorphy.com> At some point in the past, I wrote: >> The second is a triplefault on x86-64 under some >> condition involving a long-running database regression test. There has >> obviously been considerably less progress there in no small part due to >> the amount of time required to reproduce the issue. On Mon, Nov 08, 2004 at 06:30:07AM +1100, Anton Blanchard wrote: > OK. We have not seen a similar issue on ppc64 even with extensive > testing (although with HPC apps). The question is how long we should > hold off on further hugetlb development waiting for this one bug report > on a single architecture to be chased. Until it's fixed. Until then I'm considering it a byproduct of that same development. And with your report, that makes it two architectures, not one. The concepts of the features etc. are all generally okay, though very buzzword-centric. In general the audits and sweeps have been lacking thoroughness in the architecture-specific areas. I expect that particular issue to have been the cause of these two bugreports. -- wli From anton at samba.org Mon Nov 8 08:22:12 2004 From: anton at samba.org (Anton Blanchard) Date: Mon, 8 Nov 2004 08:22:12 +1100 Subject: [RFC] Consolidate lots of hugepage code In-Reply-To: <20041107210943.GN2890@holomorphy.com> References: <20041029033708.GF12247@zax> <20041029034817.GY12934@holomorphy.com> <20041107172030.GA16976@krispykreme.ozlabs.ibm.com> <20041107192024.GM2890@holomorphy.com> <20041107193007.GC16976@krispykreme.ozlabs.ibm.com> <20041107210943.GN2890@holomorphy.com> Message-ID: <20041107212212.GD16976@krispykreme.ozlabs.ibm.com> > On Mon, Nov 08, 2004 at 06:30:07AM +1100, Anton Blanchard wrote: > > OK. We have not seen a similar issue on ppc64 even with extensive > > testing (although with HPC apps). The question is how long we should > > hold off on further hugetlb development waiting for this one bug report > > on a single architecture to be chased. > > Until it's fixed. Until then I'm considering it a byproduct of that same > development. And with your report, that makes it two architectures, not > one. We _arent_ seeing it on ppc64. Can we at least have a complete bug report if we are to halt all hugetlb development? At the moment we dont have much information to go on at all. Anton From wli at holomorphy.com Mon Nov 8 09:49:48 2004 From: wli at holomorphy.com (William Lee Irwin III) Date: Sun, 7 Nov 2004 14:49:48 -0800 Subject: [RFC] Consolidate lots of hugepage code In-Reply-To: <20041107212212.GD16976@krispykreme.ozlabs.ibm.com> References: <20041029033708.GF12247@zax> <20041029034817.GY12934@holomorphy.com> <20041107172030.GA16976@krispykreme.ozlabs.ibm.com> <20041107192024.GM2890@holomorphy.com> <20041107193007.GC16976@krispykreme.ozlabs.ibm.com> <20041107210943.GN2890@holomorphy.com> <20041107212212.GD16976@krispykreme.ozlabs.ibm.com> Message-ID: <20041107224948.GO2890@holomorphy.com> At some point in the past, I wrote: >> Until it's fixed. Until then I'm considering it a byproduct of that same >> development. And with your report, that makes it two architectures, not >> one. On Mon, Nov 08, 2004 at 08:22:12AM +1100, Anton Blanchard wrote: > We _arent_ seeing it on ppc64. Can we at least have a complete bug > report if we are to halt all hugetlb development? At the moment we dont > have much information to go on at all. Sorry, I don't get complete bugreports myself. If you care to try to actually fix something (it's doubtful you yourself are the culprit) I'm still trying to reproduce it myself with long-running database tests. It's reliably reproducible on the reporters' machines. The particular bug is only one piece of evidence. Just asking basic questions about what was done for architecture code reveals that all this "development" is not paying proper attention to architecture code. I merely insist that development toward the end of stabilization occur prior to that for large feature work. And frankly, I'm rather unimpressed with the gravity of the proposed featurework, particularly in comparison to the stability requirements of users on typical production systems. Nor am I impressed with the quality. The patch presentations have been messy, the audits (as mentioned above) incomplete, the benefits not clearly demonstrated, and the code itself not so pretty. Just respinning the patches so they're properly incremental and the code somewhat cleaner (e.g. some recent one nested tabs 5 deep or so) would already remedy a large number of the issues with the featurework. Once arranged that way the audits' incompleteness can be dealt with by those with the fortitude to thoroughly audit and/or prior architecture knowledge to correct the patches for arches they don't deal with properly. -- wli From segher at kernel.crashing.org Mon Nov 8 20:27:15 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Mon, 8 Nov 2004 10:27:15 +0100 Subject: Booting Imac G5 In-Reply-To: <1099702566.3946.49.camel@gaston> References: <1099656843.8346.7.camel@rapid> <1099702566.3946.49.camel@gaston> Message-ID: <60960DF4-3168-11D9-A1A1-000A95A4DC02@kernel.crashing.org> > Nice ! Did you submit the new PCI IDs to the online database too ? > > http://pciids.sourceforge.net/ And please :-) Segher From segher at kernel.crashing.org Mon Nov 8 20:28:28 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Mon, 8 Nov 2004 10:28:28 +0100 Subject: Booting Imac G5 In-Reply-To: <1099740535.8346.33.camel@rapid> References: <1099656843.8346.7.camel@rapid> <1099702566.3946.49.camel@gaston> <1099740535.8346.33.camel@rapid> Message-ID: <8C3E72E1-3168-11D9-A1A1-000A95A4DC02@kernel.crashing.org> > I also added the two following devices that I just identified, using > /proc/device-tree to locate them: > 004f Shasta Mac I/O > 0058 U3 AGP bridge It's U3L, instead. Segher From l_indien at magic.fr Mon Nov 8 22:14:28 2004 From: l_indien at magic.fr (J. Mayer) Date: Mon, 08 Nov 2004 12:14:28 +0100 Subject: Booting Imac G5 In-Reply-To: <60960DF4-3168-11D9-A1A1-000A95A4DC02@kernel.crashing.org> References: <1099656843.8346.7.camel@rapid> <1099702566.3946.49.camel@gaston> <60960DF4-3168-11D9-A1A1-000A95A4DC02@kernel.crashing.org> Message-ID: <1099912468.8346.1121.camel@rapid> On Mon, 2004-11-08 at 10:27, Segher Boessenkool wrote: > > Nice ! Did you submit the new PCI IDs to the online database too ? > > > > http://pciids.sourceforge.net/ > > And please :-) OK, done. -- J. Mayer Never organized From l_indien at magic.fr Tue Nov 9 00:30:12 2004 From: l_indien at magic.fr (J. Mayer) Date: Mon, 08 Nov 2004 14:30:12 +0100 Subject: Booting Imac G5 (Wrong patch !) In-Reply-To: <1099787630.3884.118.camel@gaston> References: <1099656843.8346.7.camel@rapid> <1099702566.3946.49.camel@gaston> <1099769123.8346.41.camel@rapid> <1099774242.10262.99.camel@gaston> <1099785081.5295.114.camel@gaston> <1099787569.3946.116.camel@gaston> <1099787630.3884.118.camel@gaston> Message-ID: <1099920612.8346.1460.camel@rapid> On Sun, 2004-11-07 at 01:33, Benjamin Herrenschmidt wrote: > Oops, sent the wrong patch, here it is: [...] Made some tries, this does not help. I'm not sure, but I feel like we miss an IRQ... I'll do more testing when I'll have more time. -- J. Mayer Never organized From brking at us.ibm.com Tue Nov 9 03:19:34 2004 From: brking at us.ibm.com (brking at us.ibm.com) Date: Mon, 08 Nov 2004 10:19:34 -0600 Subject: [PATCH 1/2] ppc64: Block config accesses during BIST #3 Message-ID: <200411081619.iA8GJabM014634@d03av02.boulder.ibm.com> Below is a revised patch in a attempt at sharing more code between iSeries and pSeries and also getting full ppc64 support of the new APIs essentially for free. Some PCI adapters on pSeries and iSeries hardware (ipr scsi adapters) have an exposure today in that they issue BIST to the adapter to reset the card. If, during the time it takes to complete BIST, userspace attempts to access PCI config space, the host bus bridge will master abort the access since the ipr adapter does not respond on the PCI bus for a brief period of time when running BIST. This master abort results in the host PCI bridge isolating that PCI device from the rest of the system, making the device unusable until Linux is rebooted. This patch is an attempt to close that exposure by introducing some blocking code in the arch specific PCI code. The intent is to have the ipr device driver invoke these routines to prevent userspace PCI accesses from occurring during this window. It has been tested by running BIST on an ipr adapter while running a script which looped reading the config space of that adapter through sysfs. Without the patch, an EEH error occurrs. With the patch there is no EEH error. Tested on Power 5. Signed-off-by: Brian King --- linux-2.6.10-rc1-bk18-bjking1/arch/ppc64/kernel/pSeries_pci.c | 2 linux-2.6.10-rc1-bk18-bjking1/arch/ppc64/kernel/pci.c | 112 +++++++++- linux-2.6.10-rc1-bk18-bjking1/arch/ppc64/kernel/pci.h | 1 linux-2.6.10-rc1-bk18-bjking1/include/asm-ppc64/pci-bridge.h | 4 linux-2.6.10-rc1-bk18-bjking1/include/asm-ppc64/pci.h | 13 + 5 files changed, 129 insertions(+), 3 deletions(-) diff -puN include/asm-ppc64/pci.h~ppc64_block_cfg_io_during_bist_revised include/asm-ppc64/pci.h --- linux-2.6.10-rc1-bk18/include/asm-ppc64/pci.h~ppc64_block_cfg_io_during_bist_revised 2004-11-08 09:32:48.000000000 -0600 +++ linux-2.6.10-rc1-bk18-bjking1/include/asm-ppc64/pci.h 2004-11-08 09:32:48.000000000 -0600 @@ -85,6 +85,7 @@ struct pci_dma_ops { }; extern struct pci_dma_ops pci_dma_ops; +extern struct pci_ops pci_ops; static inline void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, dma_addr_t *dma_handle) @@ -244,6 +245,18 @@ extern int pci_read_irq_line(struct pci_ extern void pcibios_add_platform_entries(struct pci_dev *dev); +extern void pci_block_config_io(struct pci_dev *dev); + +extern void pci_unblock_config_io(struct pci_dev *dev); + +extern int pci_start_bist(struct pci_dev *dev); + +extern int pcibios_read_config(struct pci_bus *bus, unsigned int devfn, + int where, int size, u32 *val); + +extern int pcibios_write_config(struct pci_bus *bus, unsigned int devfn, + int where, int size, u32 val); + #endif /* __KERNEL__ */ #endif /* __PPC64_PCI_H */ diff -puN arch/ppc64/kernel/pci.c~ppc64_block_cfg_io_during_bist_revised arch/ppc64/kernel/pci.c --- linux-2.6.10-rc1-bk18/arch/ppc64/kernel/pci.c~ppc64_block_cfg_io_during_bist_revised 2004-11-08 09:32:48.000000000 -0600 +++ linux-2.6.10-rc1-bk18-bjking1/arch/ppc64/kernel/pci.c 2004-11-08 09:32:48.000000000 -0600 @@ -321,7 +321,7 @@ static int __init pcibios_init(void) /* Scan all of the recorded PCI controllers. */ list_for_each_entry_safe(hose, tmp, &hose_list, list_node) { hose->last_busno = 0xff; - bus = pci_scan_bus(hose->first_busno, hose->ops, + bus = pci_scan_bus(hose->first_busno, &pci_ops, hose->arch_data); hose->bus = bus; hose->last_busno = bus->subordinate; @@ -547,6 +547,104 @@ int pci_mmap_page_range(struct pci_dev * return ret; } +static spinlock_t config_lock = SPIN_LOCK_UNLOCKED; + +int pcibios_read_config(struct pci_bus *bus, unsigned int devfn, + int where, int size, u32 *val) +{ + struct pci_controller *hose = pci_bus_to_host(bus); + unsigned long flags; + int rc = 0; + + spin_lock_irqsave(&config_lock, flags); + if (hose && !(hose->block_cfg_io_mask & (1 << PCI_SLOT(devfn)))) + rc = hose->ops->read(bus, devfn, where, size, val); + else + *val = -1; + spin_unlock_irqrestore(&config_lock, flags); + return rc; +} +EXPORT_SYMBOL(pcibios_read_config); + +int pcibios_write_config(struct pci_bus *bus, unsigned int devfn, + int where, int size, u32 val) +{ + struct pci_controller *hose = pci_bus_to_host(bus); + unsigned long flags; + int rc = 0; + + spin_lock_irqsave(&config_lock, flags); + if (hose && !(hose->block_cfg_io_mask & (1 << PCI_SLOT(devfn)))) + rc = hose->ops->write(bus, devfn, where, size, val); + spin_unlock_irqrestore(&config_lock, flags); + return rc; +} +EXPORT_SYMBOL(pcibios_write_config); + +struct pci_ops pci_ops = { + pcibios_read_config, + pcibios_write_config +}; + +/** + * pci_block_config_io - Block PCI config reads/writes + * @pdev: pci device struct + * + * This function blocks any PCI config accesses from occurring. + * When blocked, any writes will be ignored and treated as + * successful and any reads will return all 1's data. + * + * Return value: + * nothing + **/ +void pci_block_config_io(struct pci_dev *pdev) +{ + struct pci_controller *hose = PCI_GET_PHB_PTR(pdev); + unsigned long flags; + + spin_lock_irqsave(&config_lock, flags); + hose->block_cfg_io_mask |= (1 << PCI_SLOT(pdev->devfn)); + spin_unlock_irqrestore(&config_lock, flags); +} +EXPORT_SYMBOL(pci_block_config_io); + +/** + * pci_unblock_config_io - Unblock PCI config reads/writes + * @pdev: pci device struct + * + * This function allows PCI config accesses to resume. + * + * Return value: + * nothing + **/ +void pci_unblock_config_io(struct pci_dev *pdev) +{ + struct pci_controller *hose = PCI_GET_PHB_PTR(pdev); + unsigned long flags; + + spin_lock_irqsave(&config_lock, flags); + hose->block_cfg_io_mask &= ~(1 << PCI_SLOT(pdev->devfn)); + spin_unlock_irqrestore(&config_lock, flags); +} +EXPORT_SYMBOL(pci_unblock_config_io); + +/** + * pci_start_bist - Start BIST on a PCI device + * @pdev: pci device struct + * + * This function allows a device driver to start BIST + * when PCI config accesses are disabled. + * + * Return value: + * nothing + **/ +int pci_start_bist(struct pci_dev *pdev) +{ + struct pci_controller *hose = pci_bus_to_host(pdev->bus); + return hose->ops->write(pdev->bus, pdev->devfn, PCI_BIST, 1, PCI_BIST_START); +} +EXPORT_SYMBOL(pci_start_bist); + #ifdef CONFIG_PPC_MULTIPLATFORM static ssize_t pci_show_devspec(struct device *dev, char *buf) { @@ -852,6 +950,18 @@ struct pci_controller* pci_find_hose_for return NULL; } +struct pci_controller* pci_find_hose_for_bus(struct pci_bus *bus) +{ + while (bus) { + struct pci_controller *hose, *tmp; + list_for_each_entry_safe(hose, tmp, &hose_list, list_node) + if (hose->bus == bus) + return hose; + bus=bus->parent; + } + return NULL; +} + /* * ppc64 can have multifunction devices that do not respond to function 0. * In this case we must scan all functions. diff -puN include/asm-ppc64/pci-bridge.h~ppc64_block_cfg_io_during_bist_revised include/asm-ppc64/pci-bridge.h --- linux-2.6.10-rc1-bk18/include/asm-ppc64/pci-bridge.h~ppc64_block_cfg_io_during_bist_revised 2004-11-08 09:32:48.000000000 -0600 +++ linux-2.6.10-rc1-bk18-bjking1/include/asm-ppc64/pci-bridge.h 2004-11-08 09:32:48.000000000 -0600 @@ -65,6 +65,7 @@ struct pci_controller { unsigned long buid; unsigned long dma_window_base_cur; unsigned long dma_window_size; + unsigned int block_cfg_io_mask; }; /* @@ -100,6 +101,7 @@ extern int pcibios_remove_root_bus(struc #define PCI_GET_DN(dev) ((struct device_node *)((dev)->sysdata)) extern void phbs_remap_io(void); +extern struct pci_controller* pci_find_hose_for_bus(struct pci_bus *bus); static inline struct pci_controller *pci_bus_to_host(struct pci_bus *bus) { @@ -113,7 +115,7 @@ static inline struct pci_controller *pci busdn = b->sysdata; } if (busdn == NULL) - return NULL; + return pci_find_hose_for_bus(bus); return busdn->phb; } diff -puN arch/ppc64/kernel/pci.h~ppc64_block_cfg_io_during_bist_revised arch/ppc64/kernel/pci.h --- linux-2.6.10-rc1-bk18/arch/ppc64/kernel/pci.h~ppc64_block_cfg_io_during_bist_revised 2004-11-08 09:32:48.000000000 -0600 +++ linux-2.6.10-rc1-bk18-bjking1/arch/ppc64/kernel/pci.h 2004-11-08 09:32:48.000000000 -0600 @@ -19,6 +19,7 @@ extern struct pci_controller* pci_alloc_ extern void pci_setup_phb_io(struct pci_controller *hose, int primary); extern struct pci_controller* pci_find_hose_for_OF_device(struct device_node* node); +extern struct pci_controller* pci_find_hose_for_bus(struct pci_bus *bus); extern void pci_setup_phb_io_dynamic(struct pci_controller *hose); diff -puN arch/ppc64/kernel/pSeries_pci.c~ppc64_block_cfg_io_during_bist_revised arch/ppc64/kernel/pSeries_pci.c --- linux-2.6.10-rc1-bk18/arch/ppc64/kernel/pSeries_pci.c~ppc64_block_cfg_io_during_bist_revised 2004-11-08 09:32:48.000000000 -0600 +++ linux-2.6.10-rc1-bk18-bjking1/arch/ppc64/kernel/pSeries_pci.c 2004-11-08 09:32:48.000000000 -0600 @@ -434,7 +434,7 @@ struct pci_controller * __devinit init_p pci_devs_phb_init_dynamic(phb); phb->last_busno = 0xff; - bus = pci_scan_bus(phb->first_busno, phb->ops, phb->arch_data); + bus = pci_scan_bus(phb->first_busno, &pci_ops, phb->arch_data); phb->bus = bus; phb->last_busno = bus->subordinate; _ From brking at us.ibm.com Tue Nov 9 03:19:42 2004 From: brking at us.ibm.com (brking at us.ibm.com) Date: Mon, 08 Nov 2004 10:19:42 -0600 Subject: [PATCH 2/2] ipr: Block config IO during BIST (#3) Message-ID: <200411081619.iA8GJgFS000609@d03av01.boulder.ibm.com> Change ipr to use new ppc64 pci APIs to block PCI config space accesses when running BIST to prevent PCI master aborts. Signed-off-by: Brian King --- linux-2.6.10-rc1-bk18-bjking1/drivers/scsi/ipr.c | 5 ++++- linux-2.6.10-rc1-bk18-bjking1/drivers/scsi/ipr.h | 7 +++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff -puN drivers/scsi/ipr.h~ipr_block_config_io_during_bist_revised drivers/scsi/ipr.h --- linux-2.6.10-rc1-bk18/drivers/scsi/ipr.h~ipr_block_config_io_during_bist_revised 2004-11-08 09:32:53.000000000 -0600 +++ linux-2.6.10-rc1-bk18-bjking1/drivers/scsi/ipr.h 2004-11-08 09:32:53.000000000 -0600 @@ -1112,6 +1112,13 @@ __FUNCTION__, __LINE__, ioa_cfg #define ipr_remove_dump_file(kobj, attr) do { } while(0) #endif +#ifndef CONFIG_PPC64 +#define pci_block_config_io(dev) do { } while(0) +#define pci_unblock_config_io(dev) do { } while(0) +#define pci_start_bist(dev) \ + pci_write_config_byte(dev, PCI_BIST, PCI_BIST_START) +#endif + /* * Error logging macros */ diff -puN drivers/scsi/ipr.c~ipr_block_config_io_during_bist_revised drivers/scsi/ipr.c --- linux-2.6.10-rc1-bk18/drivers/scsi/ipr.c~ipr_block_config_io_during_bist_revised 2004-11-08 09:32:53.000000000 -0600 +++ linux-2.6.10-rc1-bk18-bjking1/drivers/scsi/ipr.c 2004-11-08 09:32:53.000000000 -0600 @@ -4935,6 +4935,7 @@ static int ipr_reset_restore_cfg_space(s int rc; ENTER; + pci_unblock_config_io(ioa_cfg->pdev); rc = pci_restore_state(ioa_cfg->pdev); if (rc != PCIBIOS_SUCCESSFUL) { @@ -4989,9 +4990,11 @@ static int ipr_reset_start_bist(struct i int rc; ENTER; - rc = pci_write_config_byte(ioa_cfg->pdev, PCI_BIST, PCI_BIST_START); + pci_block_config_io(ioa_cfg->pdev); + rc = pci_start_bist(ioa_cfg->pdev); if (rc != PCIBIOS_SUCCESSFUL) { + pci_unblock_config_io(ioa_cfg->pdev); ipr_cmd->ioasa.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR); rc = IPR_RC_JOB_CONTINUE; } else { _ From ebenoit at hopevale.com Tue Nov 9 04:03:19 2004 From: ebenoit at hopevale.com (ebenoit at hopevale.com) Date: Mon, 8 Nov 2004 12:03:19 -0500 Subject: G5 two SCSI hard drive partitioning Message-ID: <1099933399.418fa6d79514e@www.hopevale.com> I am not sure if I am on the correct list, so excuse me if this does not relate. I am installing mandrake 9.1 ppc on a G5 with two SCSI drives the first drive is 36GB and the second is 74GB. Here is my question: How can I partition them so that I can have a /home directory of 90GB? I have tried to use LVM, but have not found enough information to set it up correctly with other partitions. Plus, it fails with a 'pvcreate failed' error message. I thought using linear RAID would do the trick, but again I am a beginer with both of these partitioning schemes. Here is what I have tried to accomplish: bootstrap | 10mb | no mount | apple_bootstrap root | 2GB | / | ext3 swap | 800mb | swap | Linux Swap home | 90GB | /home | LVM Thank you for your comments and or suggestions, Eric ------------------------------------------------------------ Hopevale Union Free School District: http://www.hopevale.com From linas at austin.ibm.com Tue Nov 9 05:16:19 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 8 Nov 2004 12:16:19 -0600 Subject: [PATCH] PPC64 Poor assembly coding style Message-ID: <20041108181619.GT10026@austin.ibm.com> Hi, Doug Maxey reported a bug with the latest/greatest gas assembler that demonstrates some poor coding style in entry.S and head.S. The following patch cleans up that style, and also avoids assembler confusion. Basically, in entry.S, cmpldi 0,r0,NR_syscalls should be written as either cmpldi r0,NR_syscalls or as cmpldi cr0,r0,NR_syscalls All three forms are theoretically equivalent; in practice, I find the first alternative the cleanest (and also consistent with usage elsewhere in the files). The new assembler seems to be mistaking NR_syscalls for a register number, which is clearly out of bounds (its not in 0..31). I think it would be cleaner overall to just drop the superfluous leading cr0. There are two other confusing usages, in head.S: I propose that cmpldi cr0,r5,0 should be cmpldi r5,0 cmpld 0,r6,r5 should be cmpld r6,r5 --linas Signed-off-by: Linas Vepstas -------------- next part -------------- Hi, Doug Maxey reported a bug with the latest/greatest gas assembler that demonstrates some poor coding style in entry.S and head.S. The following patch cleans up that style, and also avoids assembler confusion. Basically, in entry.S, cmpldi 0,r0,NR_syscalls should be written as either cmpldi r0,NR_syscalls or as cmpldi cr0,r0,NR_syscalls All three forms are theoretically equivalent; in practice, I find the first alternative the cleanest (and also consistent with usage elsewhere in the files). The new assembler seems to be mistaking NR_syscalls for a register number, which is clearly out of bounds (its not in 0..31). I think it would be cleaner overall to just drop the superfluous leading cr0. There are two other confusing usages, in head.S: I propose that cmpldi cr0,r5,0 should be cmpldi r5,0 cmpld 0,r6,r5 should be cmpld r6,r5 --linas Signed-off-by: Linas Vepstas ===== arch/ppc64/kernel/entry.S 1.46 vs edited ===== --- 1.46/arch/ppc64/kernel/entry.S 2004-10-07 16:52:16 -05:00 +++ edited/arch/ppc64/kernel/entry.S 2004-11-08 11:45:59 -06:00 @@ -122,7 +122,7 @@ SystemCall_common: andi. r11,r10,_TIF_SYSCALL_T_OR_A bne- syscall_dotrace syscall_dotrace_cont: - cmpldi 0,r0,NR_syscalls + cmpldi r0,NR_syscalls bge- syscall_enosys system_call: /* label this so stack traces look sane */ ===== arch/ppc64/kernel/head.S 1.81 vs edited ===== --- 1.81/arch/ppc64/kernel/head.S 2004-10-19 02:18:43 -05:00 +++ edited/arch/ppc64/kernel/head.S 2004-11-08 11:49:04 -06:00 @@ -1303,7 +1303,7 @@ _GLOBAL(__start_initialization_multiplat /* * Are we booted from a PROM Of-type client-interface ? */ - cmpldi cr0,r5,0 + cmpldi r5,0 bne .__boot_from_prom /* yes -> prom */ /* S