From trini at kernel.crashing.org Wed Feb 1 02:08:24 2006 From: trini at kernel.crashing.org (Tom Rini) Date: Tue, 31 Jan 2006 08:08:24 -0700 Subject: Maple fails to boot current git In-Reply-To: <1138679592.4934.1.camel@localhost.localdomain> References: <20060130171759.GE22672@smtp.west.cox.net> <20060130231118.GA19671@localhost.localdomain> <1138679592.4934.1.camel@localhost.localdomain> Message-ID: <20060131150824.GO22672@smtp.west.cox.net> On Tue, Jan 31, 2006 at 02:53:11PM +1100, Benjamin Herrenschmidt wrote: > On Tue, 2006-01-31 at 12:11 +1300, David Gibson wrote: > > On Mon, Jan 30, 2006 at 10:17:59AM -0700, Tom Rini wrote: > > > Hello, trying to boot my maple board (ppc64_defconfig + > > > CONFIG_PPC_EARLY_DEBUG_MAPLE=y) fails as follows (the "dirty" is > > > #define DEBUG in kernel/prom_parse.c and platforms/maple/time.c): > > > > Crud. Our Maple is stuffed at the moment (doesn't complete the CPU > > init script, so PIBS never even comes up on the 970), so I can't > > really investigate. > > Well, the RTC problem definitely looks like a bogus or lack of "ranges" > property or the fact that the parser doesn't recognize "ht" as a PCI > bus. You may want to try updating prom_parse.c to treat "ht" as a PCI > bus and see if that helps. With the following, I get parent bus is pci now, but still: OF: ** translation for device /ht at 0/isa at 4/rtc at 900 ** OF: bus is isa (na=2, ns=1) on /ht at 0/isa at 4 OF: translating address: 00000001 00000900 OF: parent bus is pci (na=3, ns=2) on /ht at 0 OF: walking ranges... OF: not found ! Maple: Unable to translate RTC address Maple: No device node for RTC, assuming legacy address (0x70) diff --git a/arch/powerpc/kernel/prom_parse.c b/arch/powerpc/kernel/prom_parse.c index a8099c8..6006201 100644 --- a/arch/powerpc/kernel/prom_parse.c +++ b/arch/powerpc/kernel/prom_parse.c @@ -1,4 +1,4 @@ -#undef DEBUG +#define DEBUG #include #include @@ -113,8 +113,10 @@ static unsigned int of_bus_default_get_f static int of_bus_pci_match(struct device_node *np) { - /* "vci" is for the /chaos bridge on 1st-gen PCI powermacs */ - return !strcmp(np->type, "pci") || !strcmp(np->type, "vci"); + /* "vci" is for the /chaos bridge on 1st-gen PCI powermacs, "ht" + * is the maple board. */ + return !strcmp(np->type, "pci") || !strcmp(np->type, "vci") || + !strcmp(np->type, "ht"); } static void of_bus_pci_count_cells(struct device_node *np, @@ -239,6 +241,16 @@ static struct of_bus of_busses[] = { .translate = of_bus_pci_translate, .get_flags = of_bus_pci_get_flags, }, + /* HT */ + { + .name = "ht", + .addresses = "assigned-addresses", + .match = of_bus_pci_match, + .count_cells = of_bus_pci_count_cells, + .map = of_bus_pci_map, + .translate = of_bus_pci_translate, + .get_flags = of_bus_pci_get_flags, + }, /* ISA */ { .name = "isa", -- Tom Rini http://gate.crashing.org/~trini/ From trini at kernel.crashing.org Wed Feb 1 02:11:17 2006 From: trini at kernel.crashing.org (Tom Rini) Date: Tue, 31 Jan 2006 08:11:17 -0700 Subject: [PATCH 2.6.16-rc1] Fix booting Maple boards (was: Re: LINUXPPC64 Maple fails to boot current git) In-Reply-To: <1138662630.3417.26.camel@brick.watson.ibm.com> References: <20060130171759.GE22672@smtp.west.cox.net> <1138662630.3417.26.camel@brick.watson.ibm.com> Message-ID: <20060131151117.GP22672@smtp.west.cox.net> On Mon, Jan 30, 2006 at 06:10:30PM -0500, Michal Ostrowski wrote: > I saw something similar on a JS-20 w SLOF. The last message you see is > related to the RTC driver, but the next thing to run after that is > console_init(), which was where my system was dying. > > Dropping the "#ifdef CONFIG_ISA" statements in > arch/powerpc/kernel/legacy_serial.c appears to fix things, and I've been > told that a patch to this effect has been posted (though I've yet to see > it). The following gets my Maple booting again, and I _think_ is testing what was intended --- When looking for legacy serial ports, condition poking of "ISA" areas on CONFIG_GENERIC_ISA_DMA, rather than CONFIG_ISA as some boards (such as the Maple) have no ISA slots, but do have ISA serial ports. Signed-off-by: Tom Rini arch/powerpc/kernel/legacy_serial.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/legacy_serial.c b/arch/powerpc/kernel/legacy_serial.c index f970ace..3dd7b39 100644 --- a/arch/powerpc/kernel/legacy_serial.c +++ b/arch/powerpc/kernel/legacy_serial.c @@ -134,7 +134,7 @@ static int __init add_legacy_soc_port(st return add_legacy_port(np, -1, UPIO_MEM, addr, addr, NO_IRQ, flags); } -#ifdef CONFIG_ISA +#ifdef CONFIG_GENERIC_ISA_DMA static int __init add_legacy_isa_port(struct device_node *np, struct device_node *isa_brg) { @@ -276,7 +276,7 @@ void __init find_legacy_serial_ports(voi of_node_put(soc); } -#ifdef CONFIG_ISA +#ifdef CONFIG_GENERIC_ISA_DMA /* First fill our array with ISA ports */ for (np = NULL; (np = of_find_node_by_type(np, "serial"));) { struct device_node *isa = of_get_parent(np); -- Tom Rini http://gate.crashing.org/~trini/ From linas at austin.ibm.com Wed Feb 1 07:22:14 2006 From: linas at austin.ibm.com (linas) Date: Tue, 31 Jan 2006 14:22:14 -0600 Subject: creating PCI-related sysfs entries Message-ID: <20060131202214.GZ19465@austin.ibm.com> Hi, I want to create some sysfs entries in order to report on the status of PCI slots. (If you are guessing that this pertains to the PCI error recovery code, you'd be right). I'm having trouble figuring out the best way to do this. There are existing entries at /sys/bus/pci/slots/... but these are for hotplug slots only; none of the soldered-onto-the-MB devices show up here. Is this intentional, or is this a bug/ overshight/not-yet-implemented thing? I also want to report some roll-up system-wide statistics both /sys/module and /sys/class seem reasonable. My code does not compile as a module. Suggestions? Yes, I'm going to RTFM shortly after I hit the send key, assuming I find the FM. --linas From greg at kroah.com Wed Feb 1 07:34:56 2006 From: greg at kroah.com (Greg KH) Date: Tue, 31 Jan 2006 12:34:56 -0800 Subject: creating PCI-related sysfs entries In-Reply-To: <20060131202214.GZ19465@austin.ibm.com> References: <20060131202214.GZ19465@austin.ibm.com> Message-ID: <20060131203456.GA23819@kroah.com> On Tue, Jan 31, 2006 at 02:22:14PM -0600, linas wrote: > > Hi, > > I want to create some sysfs entries in order to report on the > status of PCI slots. (If you are guessing that this pertains > to the PCI error recovery code, you'd be right). I'm having > trouble figuring out the best way to do this. > > There are existing entries at /sys/bus/pci/slots/... but these > are for hotplug slots only; none of the soldered-onto-the-MB > devices show up here. Is this intentional, or is this a bug/ > overshight/not-yet-implemented thing? Not implemented, as it's up to a pci hotplug controller driver to provide those slots. It sounds like your driver needs to be expanded :) > I also want to report some roll-up system-wide statistics > both /sys/module and /sys/class seem reasonable. My code > does not compile as a module. Suggestions? What kind of statistics? Is this driver related? PCI bus related? Device related? thanks, greg k-h From linas at austin.ibm.com Wed Feb 1 08:08:05 2006 From: linas at austin.ibm.com (linas) Date: Tue, 31 Jan 2006 15:08:05 -0600 Subject: creating PCI-related sysfs entries In-Reply-To: <20060131203456.GA23819@kroah.com> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> Message-ID: <20060131210805.GA19465@austin.ibm.com> On Tue, Jan 31, 2006 at 12:34:56PM -0800, Greg KH was heard to remark: > On Tue, Jan 31, 2006 at 02:22:14PM -0600, linas wrote: > > > > I want to create some sysfs entries in order to report on the > > status of PCI slots. (If you are guessing that this pertains > > to the PCI error recovery code, you'd be right). I'm having > > trouble figuring out the best way to do this. > > > > There are existing entries at /sys/bus/pci/slots/... but these > > are for hotplug slots only; none of the soldered-onto-the-MB > > devices show up here. Is this intentional, or is this a bug/ > > overshight/not-yet-implemented thing? > > Not implemented, as it's up to a pci hotplug controller driver to > provide those slots. It sounds like your driver needs to be expanded :) Hmm. But these slots are not hot-plugabble; should the arch use the hotplug infrastructure even on those slots? I note that /sys/devices/pciXXXX does have all of the pci slos listed, so perhaps that is where I can place per-slot data. > > I also want to report some roll-up system-wide statistics > > both /sys/module and /sys/class seem reasonable. My code > > does not compile as a module. Suggestions? > > What kind of statistics? Is this driver related? PCI bus related? > Device related? Related to the PCI error recovery. I'm not sure how to conceptually peg this: one could say that it is the driver for a specific type of pci-host bridge, although the code is not currently structured as such. Should I try to restructure it as such? If so, I'm not clear on how to proceed; I can't say I've clearly seen a kernel abstraction of a pci-host bridge device onto which to staple myself. I wanted to report a few read-only statistics, and a few writeable parameters: Read-only: -- total number of PCI device resets due to detected errors -- total number of "false positives" (probable errors that weren't) -- some other misc related stats. Most, but not all, of these statistics could be obtained by totalling up the per-slot statistics. Writable: -- Number of reset tries to perform before concluding that the device is hopelessly dead. Resets are disruptive and intensive, and I don't want to get stuck in an inf loop on a dead device. Linas. From greg at kroah.com Wed Feb 1 08:26:24 2006 From: greg at kroah.com (Greg KH) Date: Tue, 31 Jan 2006 13:26:24 -0800 Subject: creating PCI-related sysfs entries In-Reply-To: <20060131210805.GA19465@austin.ibm.com> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> Message-ID: <20060131212624.GA10513@kroah.com> On Tue, Jan 31, 2006 at 03:08:05PM -0600, linas wrote: > On Tue, Jan 31, 2006 at 12:34:56PM -0800, Greg KH was heard to remark: > > On Tue, Jan 31, 2006 at 02:22:14PM -0600, linas wrote: > > > > > > I want to create some sysfs entries in order to report on the > > > status of PCI slots. (If you are guessing that this pertains > > > to the PCI error recovery code, you'd be right). I'm having > > > trouble figuring out the best way to do this. > > > > > > There are existing entries at /sys/bus/pci/slots/... but these > > > are for hotplug slots only; none of the soldered-onto-the-MB > > > devices show up here. Is this intentional, or is this a bug/ > > > overshight/not-yet-implemented thing? > > > > Not implemented, as it's up to a pci hotplug controller driver to > > provide those slots. It sounds like your driver needs to be expanded :) > > Hmm. But these slots are not hot-plugabble; should the arch > use the hotplug infrastructure even on those slots? Why not? It's a good place to put them, right? > I note that /sys/devices/pciXXXX does have all of the pci > slos listed, so perhaps that is where I can place per-slot data. That's only because your arch might happen to have 1 device per slot, which is not true for other arches. And I bet it's also not true for your non-virtual boxes... > > > I also want to report some roll-up system-wide statistics > > > both /sys/module and /sys/class seem reasonable. My code > > > does not compile as a module. Suggestions? > > > > What kind of statistics? Is this driver related? PCI bus related? > > Device related? > > Related to the PCI error recovery. I'm not sure how to conceptually > peg this: one could say that it is the driver for a specific type > of pci-host bridge, although the code is not currently structured > as such. Should I try to restructure it as such? If so, I'm not > clear on how to proceed; I can't say I've clearly seen a kernel > abstraction of a pci-host bridge device onto which to staple myself. People have suggested that they create such a driver for a long time. Why not just do that? > I wanted to report a few read-only statistics, and a few writeable > parameters: > > Read-only: > -- total number of PCI device resets due to detected errors > -- total number of "false positives" (probable errors that weren't) > -- some other misc related stats. These are all "per slot" right? > Most, but not all, of these statistics could be obtained by > totalling up the per-slot statistics. > > Writable: > -- Number of reset tries to perform before concluding that the > device is hopelessly dead. Resets are disruptive and intensive, > and I don't want to get stuck in an inf loop on a dead device. Why would you want to change this value? Just pick one at build time. thanks, greg k-h From benh at kernel.crashing.org Wed Feb 1 08:31:34 2006 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 01 Feb 2006 08:31:34 +1100 Subject: Maple fails to boot current git In-Reply-To: <20060131150824.GO22672@smtp.west.cox.net> References: <20060130171759.GE22672@smtp.west.cox.net> <20060130231118.GA19671@localhost.localdomain> <1138679592.4934.1.camel@localhost.localdomain> <20060131150824.GO22672@smtp.west.cox.net> Message-ID: <1138743094.4934.11.camel@localhost.localdomain> On Tue, 2006-01-31 at 08:08 -0700, Tom Rini wrote: > On Tue, Jan 31, 2006 at 02:53:11PM +1100, Benjamin Herrenschmidt wrote: > > On Tue, 2006-01-31 at 12:11 +1300, David Gibson wrote: > > > On Mon, Jan 30, 2006 at 10:17:59AM -0700, Tom Rini wrote: > > > > Hello, trying to boot my maple board (ppc64_defconfig + > > > > CONFIG_PPC_EARLY_DEBUG_MAPLE=y) fails as follows (the "dirty" is > > > > #define DEBUG in kernel/prom_parse.c and platforms/maple/time.c): > > > > > > Crud. Our Maple is stuffed at the moment (doesn't complete the CPU > > > init script, so PIBS never even comes up on the 970), so I can't > > > really investigate. > > > > Well, the RTC problem definitely looks like a bogus or lack of "ranges" > > property or the fact that the parser doesn't recognize "ht" as a PCI > > bus. You may want to try updating prom_parse.c to treat "ht" as a PCI > > bus and see if that helps. > > With the following, I get parent bus is pci now, but still: > OF: ** translation for device /ht at 0/isa at 4/rtc at 900 ** > OF: bus is isa (na=2, ns=1) on /ht at 0/isa at 4 > OF: translating address: 00000001 00000900 > OF: parent bus is pci (na=3, ns=2) on /ht at 0 > OF: walking ranges... > OF: not found ! > Maple: Unable to translate RTC address > Maple: No device node for RTC, assuming legacy address (0x70) Can you send me the device-tree dump ? Ben. From markh at osdl.org Wed Feb 1 08:33:00 2006 From: markh at osdl.org (Mark Haverkamp) Date: Tue, 31 Jan 2006 13:33:00 -0800 Subject: iommu_alloc failure and panic In-Reply-To: <43DF691E.1020008@emulex.com> References: <200601310118.k0V1Il7Z018408@falcon30.maxeymade.com> <43DF691E.1020008@emulex.com> Message-ID: <1138743180.15732.15.camel@markh3.pdx.osdl.net> On Tue, 2006-01-31 at 08:41 -0500, James Smart wrote: > >> 2) The emulex driver has been prone to problems in the past where it's > >> been very aggressive at starting DMA operations, and I think it can > >> be avoided with tuning. What I don't know is if it's because of this, > >> or simply because of the large number of targets you have. Cc:ing James > >> Smart. > > I don't have data points for the 2.6 kernel, but I can comment on what I > have seen on the 2.4 kernel. > > The issue that I saw on the 2.4 kernel was that the pci dma alloc routine > was inappropriately allocating from the dma s/g maps. On systems with less > than 4Gig of memory, or on those with no iommmu (emt64), the checks around > adapter-supported dma masks were off (I'm going to be loose in terms to not > describe it in detail). The result was, although the adapter could support > a fully 64bit address and/or although the physical dma address would be under > 32-bits, the logic forced allocation from the mapped dma pool. On some > systems, this pool was originally only 16MB. Around 2.4.30, the swiotlb was > introduced, which reduced issue, but unfortunately, still never solved the > allocation logic. It fails less as the swiotlb simply had more space. > As far as I know, this problem doesn't exist in the 2.6 kernel. I'd have to > go look at the dma map functions to make sure. > > Why was the lpfc driver prone to the dma map exhaustion failures ? Due to the > default # of commands per lun and max sg segments reported by the driver to > the scsi midlayer, the scsi mid-layer's preallocation of dma maps for commands > for each lun, and the fact that our FC configs were usually large, had lots > of luns, and replicated the resources for each path to the same storage. > > Ultimately, what I think is the real issue here is the way the scsi mid-layer > is preallocating dma maps for the luns. 16000 luns is a huge number. > Multiply this by a max sg segment count of 64 by the driver, and a number > between 3 and 30 commands per lun, and you can see the numbers. Scsi does do > some interesting allocation algorithms once it hits an allocation failure. > One side effect of this is that it is fairly efficient at allocating the > bulk of the dma pool. James, Thanks for the information. I tried loading the lpfc driver with lpfc_lun_queue_depth=1 and haven't seen iommu_alloc failures. I'm still curious why the alloc failures lead to a panic though. Mark. > > -- james s -- Mark Haverkamp From grundler at parisc-linux.org Wed Feb 1 09:48:52 2006 From: grundler at parisc-linux.org (Grant Grundler) Date: Tue, 31 Jan 2006 15:48:52 -0700 Subject: creating PCI-related sysfs entries In-Reply-To: <20060131210805.GA19465@austin.ibm.com> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> Message-ID: <20060131224852.GA25579@colo.lackof.org> On Tue, Jan 31, 2006 at 03:08:05PM -0600, linas wrote: > Related to the PCI error recovery. I'm not sure how to conceptually > peg this: one could say that it is the driver for a specific type > of pci-host bridge, although the code is not currently structured > as such. Should I try to restructure it as such? If so, I'm not > clear on how to proceed; I can't say I've clearly seen a kernel > abstraction of a pci-host bridge device onto which to staple myself. AFAIK, no pci-host device abstraction exists. Each arch deals with pci-host bridges as it sees fit. But access methods to some PCI features are abstracted: o method access to CFG space o method to register IRQs o advertise MMIO/IO Port routing. Sounds like you want to add another method for error recovery stats/control. grant From James.Smart at Emulex.Com Wed Feb 1 00:41:50 2006 From: James.Smart at Emulex.Com (James Smart) Date: Tue, 31 Jan 2006 08:41:50 -0500 Subject: iommu_alloc failure and panic In-Reply-To: <200601310118.k0V1Il7Z018408@falcon30.maxeymade.com> References: <200601310118.k0V1Il7Z018408@falcon30.maxeymade.com> Message-ID: <43DF691E.1020008@emulex.com> >> 2) The emulex driver has been prone to problems in the past where it's >> been very aggressive at starting DMA operations, and I think it can >> be avoided with tuning. What I don't know is if it's because of this, >> or simply because of the large number of targets you have. Cc:ing James >> Smart. I don't have data points for the 2.6 kernel, but I can comment on what I have seen on the 2.4 kernel. The issue that I saw on the 2.4 kernel was that the pci dma alloc routine was inappropriately allocating from the dma s/g maps. On systems with less than 4Gig of memory, or on those with no iommmu (emt64), the checks around adapter-supported dma masks were off (I'm going to be loose in terms to not describe it in detail). The result was, although the adapter could support a fully 64bit address and/or although the physical dma address would be under 32-bits, the logic forced allocation from the mapped dma pool. On some systems, this pool was originally only 16MB. Around 2.4.30, the swiotlb was introduced, which reduced issue, but unfortunately, still never solved the allocation logic. It fails less as the swiotlb simply had more space. As far as I know, this problem doesn't exist in the 2.6 kernel. I'd have to go look at the dma map functions to make sure. Why was the lpfc driver prone to the dma map exhaustion failures ? Due to the default # of commands per lun and max sg segments reported by the driver to the scsi midlayer, the scsi mid-layer's preallocation of dma maps for commands for each lun, and the fact that our FC configs were usually large, had lots of luns, and replicated the resources for each path to the same storage. Ultimately, what I think is the real issue here is the way the scsi mid-layer is preallocating dma maps for the luns. 16000 luns is a huge number. Multiply this by a max sg segment count of 64 by the driver, and a number between 3 and 30 commands per lun, and you can see the numbers. Scsi does do some interesting allocation algorithms once it hits an allocation failure. One side effect of this is that it is fairly efficient at allocating the bulk of the dma pool. -- james s From olh at suse.de Wed Feb 1 19:26:21 2006 From: olh at suse.de (Olaf Hering) Date: Wed, 1 Feb 2006 09:26:21 +0100 Subject: [PATCH] ppc64: per cpu data optimisations In-Reply-To: <20060111021644.GC4767@krispykreme> References: <20060111021644.GC4767@krispykreme> Message-ID: <20060201082621.GA29274@suse.de> On Wed, Jan 11, Anton Blanchard wrote: Anton, this causes trouble if you have sles10 installed and if runlevel 6 is your default runlevel (aka reboot in a loop). Whats wrong with the patch? See https://bugzilla.novell.com/show_bug.cgi?id=145459 for details. there are 2 other bugs which are seen also on other archs, will start looking at them now. -- short story of a lazy sysadmin: alias appserv=wotan From linas at austin.ibm.com Thu Feb 2 08:30:18 2006 From: linas at austin.ibm.com (linas) Date: Wed, 1 Feb 2006 15:30:18 -0600 Subject: creating PCI-related sysfs entries In-Reply-To: <20060131212624.GA10513@kroah.com> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> <20060131212624.GA10513@kroah.com> Message-ID: <20060201213018.GG14705@austin.ibm.com> On Tue, Jan 31, 2006 at 01:26:24PM -0800, Greg KH was heard to remark: > On Tue, Jan 31, 2006 at 03:08:05PM -0600, linas wrote: > > > > ... the PCI error recovery. I'm not sure how to conceptually > > peg this: one could say that it is the driver for a specific type > > of pci-host bridge, although the code is not currently structured > > as such. Should I try to restructure it as such? If so, I'm not > > clear on how to proceed; I can't say I've clearly seen a kernel > > abstraction of a pci-host bridge device onto which to staple myself. > > People have suggested that they create such a driver for a long time. > Why not just do that? OK. Let me get this straight, then. Create a generic struct pci_host_bridge, which encapsulates some (all?) of the functions that Grant Grundler mentions in his email: Grant Grundler : <> Each arch deals with pci-host bridges as it sees fit. <> <>But access methods to some PCI features are abstracted: <>o method access to CFG space <>o method to register IRQs <>o advertise MMIO/IO Port routing. At the risk of over-engineering, maybe there should be a struct bus_host_bridge, and struct pci_host_bridge would derive from that? --linas p.s. rest of message: > > I wanted to report a few read-only statistics, and a few writeable > > parameters: > > > > Read-only: > > -- total number of PCI device resets due to detected errors > > -- total number of "false positives" (probable errors that weren't) > > -- some other misc related stats. > > These are all "per slot" right? Right. I'll keep them that way. > > Writable: > > -- Number of reset tries to perform before concluding that the > > device is hopelessly dead. Resets are disruptive and intensive, > > and I don't want to get stuck in an inf loop on a dead device. > > Why would you want to change this value? Just pick one at build time. OK. --linas From linas at austin.ibm.com Thu Feb 2 08:35:46 2006 From: linas at austin.ibm.com (linas) Date: Wed, 1 Feb 2006 15:35:46 -0600 Subject: creating PCI-related sysfs entries In-Reply-To: <20060131224852.GA25579@colo.lackof.org> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> <20060131224852.GA25579@colo.lackof.org> Message-ID: <20060201213546.GH14705@austin.ibm.com> On Tue, Jan 31, 2006 at 03:48:52PM -0700, Grant Grundler was heard to remark: > On Tue, Jan 31, 2006 at 03:08:05PM -0600, linas wrote: > > Related to the PCI error recovery. I'm not sure how to conceptually > > peg this: one could say that it is the driver for a specific type > > of pci-host bridge, although the code is not currently structured > > as such. Should I try to restructure it as such? If so, I'm not > > clear on how to proceed; I can't say I've clearly seen a kernel > > abstraction of a pci-host bridge device onto which to staple myself. > > AFAIK, no pci-host device abstraction exists. > Each arch deals with pci-host bridges as it sees fit. > > But access methods to some PCI features are abstracted: > o method access to CFG space > o method to register IRQs > o advertise MMIO/IO Port routing. > > Sounds like you want to add another method for error recovery > stats/control. Actually, the "recovery" part is already (mostly) in mainline, See Documentation/pci-error-recovery.txt What's hanging out are patches to specific device drivers, which have been submitted, but haven't been accepted. Another issue is that there's no implementation at this time for any arch other than powerpc, although the latest pci express bridges support this function in principle. --linas From linas at austin.ibm.com Thu Feb 2 11:19:06 2006 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 1 Feb 2006 18:19:06 -0600 Subject: [PATCH 1/2]: PowerPC/PCI Hotplug build break In-Reply-To: <1138833335.6933.5.camel@sinatra.austin.ibm.com> References: <1138833335.6933.5.camel@sinatra.austin.ibm.com> Message-ID: <20060202001906.GA24916@austin.ibm.com> Please apply ASAP: Build break: Building PCI hotplug on PowerPC results in a build break, due to failure to export symbols. Reported today by Dave Jones : drivers/pci/hotplug/rpaphp.ko needs unknown symbol pcibios_add_pci_devices This patch fixes the break in the arch/powerpc tree. Next patch fixes same problem in drivers/pci tree Signed-off-by: Linas Vepstas --- pci_dlpar.c | 3 +++ 1 files changed, 3 insertions(+) Index: linux-2.6.16-rc1-git5/arch/powerpc/platforms/pseries/pci_dlpar.c =================================================================== --- linux-2.6.16-rc1-git5.orig/arch/powerpc/platforms/pseries/pci_dlpar.c 2006-02-01 18:06:12.380829512 -0600 +++ linux-2.6.16-rc1-git5/arch/powerpc/platforms/pseries/pci_dlpar.c 2006-02-01 18:11:41.040673750 -0600 @@ -58,6 +58,7 @@ return find_bus_among_children(pdn->phb->bus, dn); } +EXPORT_SYMBOL_GPL(pcibios_find_pci_bus); /** * pcibios_remove_pci_devices - remove all devices under this bus @@ -106,6 +107,7 @@ } } } +EXPORT_SYMBOL_GPL(pcibios_fixup_new_pci_devices); static int pcibios_pci_config_bridge(struct pci_dev *dev) @@ -172,3 +174,4 @@ pcibios_pci_config_bridge(dev); } } +EXPORT_SYMBOL_GPL(pcibios_add_pci_devices); From linas at austin.ibm.com Thu Feb 2 11:21:09 2006 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 1 Feb 2006 18:21:09 -0600 Subject: [PATCH 2/2]: PowerPC/PCI Hotplug build break Message-ID: <20060202002109.GB24916@austin.ibm.com> Please apply ASAP: Build break: Building PCI hotplug on PowerPC results in a build break, due to failure to export symbols. Reported today by Dave Jones : drivers/pci/hotplug/rpaphp.ko needs unknown symbol pcibios_add_pci_devices This patch fixes same problem in drivers/pci tree Previous patch fixes the break in the arch/powerpc tree. Signed-off-by: Linas Vepstas --- rpaphp_slot.c | 1 + 1 files changed, 1 insertion(+) Index: linux-2.6.16-rc1-git5/drivers/pci/hotplug/rpaphp_slot.c =================================================================== --- linux-2.6.16-rc1-git5.orig/drivers/pci/hotplug/rpaphp_slot.c 2006-02-01 18:06:06.022722369 -0600 +++ linux-2.6.16-rc1-git5/drivers/pci/hotplug/rpaphp_slot.c 2006-02-01 18:11:46.049970222 -0600 @@ -159,6 +159,7 @@ dbg("%s - Exit: rc[%d]\n", __FUNCTION__, retval); return retval; } +EXPORT_SYMBOL_GPL(rpaphp_deregister_slot); int rpaphp_register_slot(struct slot *slot) { From grundler at parisc-linux.org Thu Feb 2 16:52:43 2006 From: grundler at parisc-linux.org (Grant Grundler) Date: Wed, 1 Feb 2006 22:52:43 -0700 Subject: creating PCI-related sysfs entries In-Reply-To: <20060201213546.GH14705@austin.ibm.com> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> <20060131224852.GA25579@colo.lackof.org> <20060201213546.GH14705@austin.ibm.com> Message-ID: <20060202055243.GA12588@colo.lackof.org> On Wed, Feb 01, 2006 at 03:35:46PM -0600, linas wrote: > > Sounds like you want to add another method for error recovery > > stats/control. > > Actually, the "recovery" part is already (mostly) in mainline, > See Documentation/pci-error-recovery.txt Yes - I've reviewed a few of the times you submitted it. What I meant was, you want to formalize error recovery methods and make it a peer to the other resources access methods I listed. ... > Another issue is that there's no implementation at this time for > any arch other than powerpc, Well, some ia64 chipsets have some limited support but it's really up to the respective companies to drive that. > although the latest pci express bridges support this function in principle. "Nguyen, Tom L" has proposed patches to support PCI-e AER (Advanced Error Reporting): http://lkml.org/lkml/2005/3/11/269 I've cc'd him in case he has an interest in resurrecting those patches and adapting them to the current framework (and vice versa). grant From linas at austin.ibm.com Fri Feb 3 03:36:36 2006 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 2 Feb 2006 10:36:36 -0600 Subject: creating PCI-related sysfs entries In-Reply-To: <20060202055243.GA12588@colo.lackof.org> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> <20060131224852.GA25579@colo.lackof.org> <20060201213546.GH14705@austin.ibm.com> <20060202055243.GA12588@colo.lackof.org> Message-ID: <20060202163636.GD24916@austin.ibm.com> On Wed, Feb 01, 2006 at 10:52:43PM -0700, Grant Grundler was heard to remark: > On Wed, Feb 01, 2006 at 03:35:46PM -0600, linas wrote: > > > Sounds like you want to add another method for error recovery > > > stats/control. > > > > Actually, the "recovery" part is already (mostly) in mainline, > > See Documentation/pci-error-recovery.txt > > What I meant was, you want to formalize error recovery methods > and make it a peer to the other resources access methods I listed. Hmm. Not sure what you mean by "a peer". pci config-space i/o is done through callbacks in the pci bus->ops structure. The PCI error recovery is done via callbacks in the pci dev structure. Was there something you'd like to see done differently? Given GregKH's remarks, it sounded like there was some interest in a "struct bus_host_bridge" abstraction, and I'd be willing to take a shot at that, provided there is general interest and general agreement. I'm not quite sure what such a struct might contain, just yet, I'm just imagining it might be non-empty. > "Nguyen, Tom L" has proposed patches > to support PCI-e AER (Advanced Error Reporting): I kept looking at AER, and could not figure out what to do with it. --linas From jfaslist at yahoo.fr Fri Feb 3 04:03:06 2006 From: jfaslist at yahoo.fr (jfaslist) Date: Thu, 02 Feb 2006 18:03:06 +0100 Subject: Maple freezing on PCI Target-Abort Message-ID: <43E23B4A.4020402@yahoo.fr> Hi, We have designed our own IBM970fx motherboard which is a (almost)clone to the IBM Maple reference kit. We are seeing that whenever a PIO read PCI cycle bound to the PCI bus that is across the AMD8111 is ended w/ a target-abort, the whole system freezes. The device signaling the TA is a PCI-VME bridge. It does so as the address passed is invalid. When the system hangs, using the service processor, I can access some AMD8111, CPC925 registers from which I can draw the following conclusions: 1- The AMD8111 secondary status tells me the AMD8111 got a TA 2- The CPC925 status/command register (0cf8070010) tells me that the TA error was forwarded to the CPC925. 3- The CPC925 APIEXCP register tells me that a DERR exception was signaled. From what I can read on the CPC925 and IBM970 cpu user manual, the DERR is the bus error that is returned to the CPU by the CPC925 to let him know that the cycle ended w/ an error. I have the following questions: -What exception vector is taking care of a DERR excp? From what I can see it seems to be the "machine check" vector. But that seems a bit drastic to me. After all this is just a PCI target abort. -I expect that the normal behavior would be for the kernel to send a signal termination to the user process which caused the PIO READ PCI cycle (from a previously mmap()'ed VMA address). Is it doable on this platform? Since a READ operation is coupled by nature, I think this is the only acceptable way. I have tried to set the MSR[RI] bit before doing the PCI cycle, but it didn't change change anything. Also on our design we disconnect the CPC925 checkstop pin from the 970 machine check pin.(see page 39 of cpc925 user's manual). So a DERR shouldn't cause a machine check I would think. I realize that these questions are very H/W related but couldn't find the answer in IBM doc. Thanks for the help, -- Best regards, _______________________________________ jean-francois simon - themis computer 5, rue irene joliot curie 38330 eybens - france +33 (0)4 76 14 77 85 ___________________________________________________________________________ Nouveau : t?l?phonez moins cher avec Yahoo! Messenger ! D?couvez les tarifs exceptionnels pour appeler la France et l'international. T?l?chargez sur http://fr.messenger.yahoo.com From grundler at parisc-linux.org Fri Feb 3 06:39:02 2006 From: grundler at parisc-linux.org (Grant Grundler) Date: Thu, 2 Feb 2006 12:39:02 -0700 Subject: creating PCI-related sysfs entries In-Reply-To: <20060202163636.GD24916@austin.ibm.com> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> <20060131224852.GA25579@colo.lackof.org> <20060201213546.GH14705@austin.ibm.com> <20060202055243.GA12588@colo.lackof.org> <20060202163636.GD24916@austin.ibm.com> Message-ID: <20060202193902.GA5424@colo.lackof.org> On Thu, Feb 02, 2006 at 10:36:36AM -0600, Linas Vepstas wrote: > Hmm. Not sure what you mean by "a peer". Just at the same level of the architecture - i.e. a server like the others. > pci config-space i/o is done through callbacks in the pci bus->ops > structure. The PCI error recovery is done via callbacks in the pci dev > structure. Was there something you'd like to see done differently? No. Each set of callbacks serves a different purpose. The services/resources at the bus level are different from those at the device level. My guess is error handling/containment can abstract at the "bus" level since we can't always guarantee "one device per slot" (think multifunction devices). > Given GregKH's remarks, it sounded like there was some interest in > a "struct bus_host_bridge" abstraction, and I'd be willing > to take a shot at that, provided there is general interest and > general agreement. I'm not quite sure what such a struct might > contain, just yet, I'm just imagining it might be non-empty. Yes, I agree don't have a better idea other than what I already pointed out. > I kept looking at AER, and could not figure out what to do > with it. I haven't either - other folks in HP "own" that. grant From linas at austin.ibm.com Fri Feb 3 07:46:24 2006 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 2 Feb 2006 14:46:24 -0600 Subject: creating PCI-related sysfs entries In-Reply-To: <20060202193902.GA5424@colo.lackof.org> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> <20060131224852.GA25579@colo.lackof.org> <20060201213546.GH14705@austin.ibm.com> <20060202055243.GA12588@colo.lackof.org> <20060202163636.GD24916@austin.ibm.com> <20060202193902.GA5424@colo.lackof.org> Message-ID: <20060202204624.GM24916@austin.ibm.com> On Thu, Feb 02, 2006 at 12:39:02PM -0700, Grant Grundler was heard to remark: > > My guess is error handling/containment can abstract at the "bus" level > since we can't always guarantee "one device per slot" (think > multifunction devices). :-) Yes, testing with multi-function cards exposed bugs, but the code should work fine with them. In particular, the design allows multi-function devices to "vote" how they want to be reset, with the dumbest voter getting thier way. The bus disconnect is reported to all functions on all affected cards/slots. This allows all instances of a device driver to react appropriately. However, for card setup/init, typically, you want to have only one driver instance do that. You'll notice in the sym53cxx2 patch I just sent, there's a + if (PCI_FUNC(pdev->devfn) == 0) + sym_reset_scsi_bus(np, 0); so that the other instances don't fall over each other reseting. There's similar code in the e100 e1000 and ixgb drivers; I tested multi-function versions of these. (not sure about ixgb). > Yes, I agree don't have a better idea other than what I already > pointed out. Hmm. well, I may have lost the thread of what that was. --linas From geoffrey.levand at am.sony.com Fri Feb 3 09:47:12 2006 From: geoffrey.levand at am.sony.com (Geoff Levand) Date: Thu, 02 Feb 2006 14:47:12 -0800 Subject: [PATCH] spufs split off platform code In-Reply-To: <200601280457.08170.arnd@arndb.de> References: <200601280457.08170.arnd@arndb.de> Message-ID: <43E28BF0.8060700@am.sony.com> Arnd Bergmann wrote: > I guess that the "spc" device type can be removed now, I don't think > that > any systems are left that have not been converted. > > Do you have "spe" type nodes at all? Is there anything that you need to > do different about them? Yes, scp can be removed. I think we can arrange it so some of the create_spu code can go back to generic code. Still investigating... >>+void spu_free_irqs(struct spu *spu) >>+{ >>+???????int irq_base; >>+ >>+???????if(!spu->priv_data) { >>+???????????????pr_debug("null priv_data in %p\n", spu); >>+???????????????return; >>+???????} > > > It may be just me, but I don't like this bit of coding style: > You are trying to deal with priv_data being either allocated > or not allocated at this point. Better make sure that you have > freed the structure before returning an error from any function > that would allocate it on success. Then get rid of the check > here. Yes, it really doesn't add any value does it. >>+struct spu_priv_data; >>+struct spu_phys { >>+???????unsigned long addr; >>+???????unsigned long size; >>+}; >> >>?struct spu { >>+???????struct spu_priv_data *priv_data; /* opaque */ >>????????char *name; > > > If you want priv_data to point to different types of data structures > depending on the context, I find it easier to understand if there is > a simple void pointer and the actual struct definitions have different > type names. Yes, a good tip. I'm looking into pushing these differences down into the lower level platform code. Hopefully it will simplify these parts. -Geoff From linas at austin.ibm.com Fri Feb 3 11:06:02 2006 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 2 Feb 2006 18:06:02 -0600 Subject: [PATCH]: Documentation: Updated PCI Error Recovery Message-ID: <20060203000602.GQ24916@austin.ibm.com> I'm not sure who I'm addressing this patch to: Linus, maybe? Please apply. Fingers crossed, I hope this may make it into 2.6.16. --linas This patch is a cleanup/restructuring/clarification of the PCI error handling doc. It should look rather professional at this point. Signed-off-by: Linas Vepstas -- pci-error-recovery.txt | 462 ++++++++++++++++++++++++++++++++----------------- 1 files changed, 306 insertions(+), 156 deletions(-) Index: linux-2.6.16-rc1-git5/Documentation/pci-error-recovery.txt =================================================================== --- linux-2.6.16-rc1-git5.orig/Documentation/pci-error-recovery.txt 2006-02-01 17:09:01.000000000 -0600 +++ linux-2.6.16-rc1-git5/Documentation/pci-error-recovery.txt 2006-02-02 18:04:57.714942210 -0600 @@ -1,246 +1,396 @@ PCI Error Recovery ------------------ - May 31, 2005 + February 2, 2006 - Current document maintainer: - Linas Vepstas + Current document maintainer: + Linas Vepstas -Some PCI bus controllers are able to detect certain "hard" PCI errors -on the bus, such as parity errors on the data and address busses, as -well as SERR and PERR errors. These chipsets are then able to disable -I/O to/from the affected device, so that, for example, a bad DMA -address doesn't end up corrupting system memory. These same chipsets -are also able to reset the affected PCI device, and return it to -working condition. This document describes a generic API form -performing error recovery. - -The core idea is that after a PCI error has been detected, there must -be a way for the kernel to coordinate with all affected device drivers -so that the pci card can be made operational again, possibly after -performing a full electrical #RST of the PCI card. The API below -provides a generic API for device drivers to be notified of PCI -errors, and to be notified of, and respond to, a reset sequence. - -Preliminary sketch of API, cut-n-pasted-n-modified email from -Ben Herrenschmidt, circa 5 april 2005 +Many PCI bus controllers are able to detect a variety of hardware +PCI errors on the bus, such as parity errors on the data and address +busses, as well as SERR and PERR errors. Some of the more advanced +chipsets are able to deal with these errors; these include PCI-E chipsets, +and the PCI-host bridges found on IBM Power4 and Power5-based pSeries +boxes. A typical action taken is to disconnect the affected device, +halting all I/O to it. The goal of a disconnection is to avoid system +corruption; for example, to halt system memory corruption due to DMA's +to "wild" addresses. Typically, a reconnection mechanism is also +offered, so that the affected PCI device(s) are reset and put back +into working condition. The reset phase requires coordination +between the affected device drivers and the PCI controller chip. +This document describes a generic API for notifying device drivers +of a bus disconnection, and then performing error recovery. +This API is currently implemented in the 2.6.16 and later kernels. + +Reporting and recovery is performed in several steps. First, when +a PCI hardware error has resulted in a bus disconnect, that event +is reported as soon as possible to all affected device drivers, +including multiple instances of a device driver on multi-function +cards. This allows device drivers to avoid deadlocking in spinloops, +waiting for some i/o-space register to change, when it never will. +It also gives the drivers a chance to defer incoming I/O as +needed. + +Next, recovery is performed in several stages. Most of the complexity +is forced by the need to handle multi-function devices, that is, +devices that have multiple device drivers associated with them. +In the first stage, each driver is allowed to indicate what type +of reset it desires, the choices being a simple re-enabling of I/O +or requesting a hard reset (a full electrical #RST of the PCI card). +If any driver requests a full reset, that is what will be done. + +After a full reset and/or a re-enabling of I/O, all drivers are +again notified, so that they may then perform any device setup/config +that may be required. After these have all completed, a final +"resume normal operations" event is sent out. + +The biggest reason for choosing a kernel-based implementation rather +than a user-space implementation was the need to deal with bus +disconnects of PCI devices attached to storage media, and, in particular, +disconnects from devices holding the root file system. If the root +file system is disconnected, a user-space mechanism would have to go +through a large number of contortions to complete recovery. Almost all +of the current Linux file systems are not tolerant of disconnection +from/reconnection to their underlying block device. By contrast, +bus errors are easy to manage in the device driver. Indeed, most +device drivers already handle very similar recovery procedures; +for example, the SCSI-generic layer already provides significant +mechanisms for dealing with SCSI bus errors and SCSI bus resets. + + +Detailed Design +--------------- +Design and implementation details below, based on a chain of +public email discussions with Ben Herrenschmidt, circa 5 April 2005. The error recovery API support is exposed to the driver in the form of a structure of function pointers pointed to by a new field in struct -pci_driver. The absence of this pointer in pci_driver denotes an -"non-aware" driver, behaviour on these is platform dependant. -Platforms like ppc64 can try to simulate pci hotplug remove/add. - -The definition of "pci_error_token" is not covered here. It is based on -Seto's work on the synchronous error detection. We still need to define -functions for extracting infos out of an opaque error token. This is -separate from this API. +pci_driver. A driver that fails to provide the structure is "non-aware", +and the actual recovery steps taken are platform dependent. The +arch/powerpc implementation will simulate a PCI hotplug remove/add. This structure has the form: - struct pci_error_handlers { - int (*error_detected)(struct pci_dev *dev, pci_error_token error); + int (*error_detected)(struct pci_dev *dev, enum pci_channel_state); int (*mmio_enabled)(struct pci_dev *dev); - int (*resume)(struct pci_dev *dev); int (*link_reset)(struct pci_dev *dev); int (*slot_reset)(struct pci_dev *dev); + void (*resume)(struct pci_dev *dev); +}; + +The possible channel states are: +enum pci_channel_state { + pci_channel_io_normal, /* I/O channel is in normal state */ + pci_channel_io_frozen, /* I/O to channel is blocked */ + pci_channel_io_perm_failure, /* PCI card is dead */ +}; + +Possible return values are: +enum pci_ers_result { + PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */ + PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */ + PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ + PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ + PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ }; -A driver doesn't have to implement all of these callbacks. The -only mandatory one is error_detected(). If a callback is not -implemented, the corresponding feature is considered unsupported. -For example, if mmio_enabled() and resume() aren't there, then the -driver is assumed as not doing any direct recovery and requires +A driver does not have to implement all of these callbacks; however, +if it implements any, it must implement error_detected(). If a callback +is not implemented, the corresponding feature is considered unsupported. +For example, if mmio_enabled() and resume() aren't there, then it +is assumed that the driver is not doing any direct recovery and requires a reset. If link_reset() is not implemented, the card is assumed as -not caring about link resets, in which case, if recover is supported, -the core can try recover (but not slot_reset() unless it really did -reset the slot). If slot_reset() is not supported, link_reset() can -be called instead on a slot reset. - -At first, the call will always be : - - 1) error_detected() - - Error detected. This is sent once after an error has been detected. At -this point, the device might not be accessible anymore depending on the -platform (the slot will be isolated on ppc64). The driver may already -have "noticed" the error because of a failing IO, but this is the proper -"synchronisation point", that is, it gives a chance to the driver to -cleanup, waiting for pending stuff (timers, whatever, etc...) to -complete; it can take semaphores, schedule, etc... everything but touch -the device. Within this function and after it returns, the driver +not care about link resets. Typically a driver will want to know about +a slot_reset(). + +The actual steps taken by a platform to recover from a PCI error +event will be platform-dependent, but will follow the general +sequence described below. + +STEP 0: Error Event +------------------- +PCI bus error is detect by the PCI hardware. On powerpc, the slot +is isolated, in that all I/O is blocked: all reads return 0xffffffff, +all writes are ignored. + + +STEP 1: Notification +-------------------- +Platform calls the error_detected() callback on every instance of +every driver affected by the error. + +At this point, the device might not be accessible anymore, depending on +the platform (the slot will be isolated on powerpc). The driver may +already have "noticed" the error because of a failing I/O, but this +is the proper "synchronization point", that is, it gives the driver +a chance to cleanup, waiting for pending stuff (timers, whatever, etc...) +to complete; it can take semaphores, schedule, etc... everything but +touch the device. Within this function and after it returns, the driver shouldn't do any new IOs. Called in task context. This is sort of a "quiesce" point. See note about interrupts at the end of this doc. - Result codes: - - PCIERR_RESULT_CAN_RECOVER: - Driever returns this if it thinks it might be able to recover +All drivers participating in this system must implement this call. +The driver must return one of the following result codes: + - PCI_ERS_RESULT_CAN_RECOVER: + Driver returns this if it thinks it might be able to recover the HW by just banging IOs or if it wants to be given - a chance to extract some diagnostic informations (see - below). - - PCIERR_RESULT_NEED_RESET: - Driver returns this if it thinks it can't recover unless the - slot is reset. - - PCIERR_RESULT_DISCONNECT: - Return this if driver thinks it won't recover at all, - (this will detach the driver ? or just leave it - dangling ? to be decided) - -So at this point, we have called error_detected() for all drivers -on the segment that had the error. On ppc64, the slot is isolated. What -happens now typically depends on the result from the drivers. If all -drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would -re-enable IOs on the slot (or do nothing special if the platform doesn't -isolate slots) and call 2). If not and we can reset slots, we go to 4), -if neither, we have a dead slot. If it's an hotplug slot, we might -"simulate" reset by triggering HW unplug/replug though. + a chance to extract some diagnostic information (see + mmio_enable, below). + - PCI_ERS_RESULT_NEED_RESET: + Driver returns this if it can't recover without a hard + slot reset. + - PCI_ERS_RESULT_DISCONNECT: + Driver returns this if it doesn't want to recover at all. + +The next step taken will depend on the result codes returned by the +drivers. + +If all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER, +then the platform should re-enable IOs on the slot (or do nothing in +particular, if the platform doesn't isolate slots), and recovery +proceeds to STEP 2 (MMIO Enable). + +If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET), +then recovery proceeds to STEP 4 (Slot Reset). + +If the platform is unable to recover the slot, the next step +is STEP 6 (Permanent Failure). ->>> Current ppc64 implementation assumes that a device driver will ->>> *not* schedule or semaphore in this routine; the current ppc64 +>>> The current powerpc implementation assumes that a device driver will +>>> *not* schedule or semaphore in this routine; the current powerpc >>> implementation uses one kernel thread to notify all devices; ->>> thus, of one device sleeps/schedules, all devices are affected. +>>> thus, if one device sleeps/schedules, all devices are affected. >>> Doing better requires complex multi-threaded logic in the error >>> recovery implementation (e.g. waiting for all notification threads >>> to "join" before proceeding with recovery.) This seems excessively >>> complex and not worth implementing. ->>> The current ppc64 implementation doesn't much care if the device ->>> attempts i/o at this point, or not. I/O's will fail, returning +>>> The current powerpc implementation doesn't much care if the device +>>> attempts I/O at this point, or not. I/O's will fail, returning >>> a value of 0xff on read, and writes will be dropped. If the device >>> driver attempts more than 10K I/O's to a frozen adapter, it will >>> assume that the device driver has gone into an infinite loop, and ->>> it will panic the the kernel. +>>> it will panic the the kernel. There doesn't seem to be any other +>>> way of stopping a device driver that insists on spinning on I/O. - 2) mmio_enabled() +STEP 2: MMIO Enabled +------------------- +The platform re-enables MMIO to the device (but typically not the +DMA), and then calls the mmio_enabled() callback on all affected +device drivers. - This is the "early recovery" call. IOs are allowed again, but DMA is +This is the "early recovery" call. IOs are allowed again, but DMA is not (hrm... to be discussed, I prefer not), with some restrictions. This is NOT a callback for the driver to start operations again, only to peek/poke at the device, extract diagnostic information, if any, and eventually do things like trigger a device local reset or some such, -but not restart operations. This is sent if all drivers on a segment -agree that they can try to recover and no automatic link reset was -performed by the HW. If the platform can't just re-enable IOs without -a slot reset or a link reset, it doesn't call this callback and goes -directly to 3) or 4). All IOs should be done _synchronously_ from -within this callback, errors triggered by them will be returned via -the normal pci_check_whatever() api, no new error_detected() callback -will be issued due to an error happening here. However, such an error -might cause IOs to be re-blocked for the whole segment, and thus -invalidate the recovery that other devices on the same segment might -have done, forcing the whole segment into one of the next states, -that is link reset or slot reset. +but not restart operations. This is callback is made if all drivers on +a segment agree that they can try to recover and if no automatic link reset +was performed by the HW. If the platform can't just re-enable IOs without +a slot reset or a link reset, it wont call this callback, and instead +will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) + +>>> The following is proposed; no platform implements this yet: +>>> Proposal: All I/O's should be done _synchronously_ from within +>>> this callback, errors triggered by them will be returned via +>>> the normal pci_check_whatever() API, no new error_detected() +>>> callback will be issued due to an error happening here. However, +>>> such an error might cause IOs to be re-blocked for the whole +>>> segment, and thus invalidate the recovery that other devices +>>> on the same segment might have done, forcing the whole segment +>>> into one of the next states, that is, link reset or slot reset. - Result codes: - - PCIERR_RESULT_RECOVERED +The driver should return one of the following result codes: + - PCI_ERS_RESULT_RECOVERED Driver returns this if it thinks the device is fully - functionnal and thinks it is ready to start + functional and thinks it is ready to start normal driver operations again. There is no guarantee that the driver will actually be allowed to proceed, as another driver on the same segment might have failed and thus triggered a slot reset on platforms that support it. - - PCIERR_RESULT_NEED_RESET + - PCI_ERS_RESULT_NEED_RESET Driver returns this if it thinks the device is not recoverable in it's current state and it needs a slot reset to proceed. - - PCIERR_RESULT_DISCONNECT + - PCI_ERS_RESULT_DISCONNECT Same as above. Total failure, no recovery even after reset driver dead. (To be defined more precisely) ->>> The current ppc64 implementation does not implement this callback. - - 3) link_reset() - - This is called after the link has been reset. This is typically -a PCI Express specific state at this point and is done whenever a -non-fatal error has been detected that can be "solved" by resetting -the link. This call informs the driver of the reset and the driver -should check if the device appears to be in working condition. -This function acts a bit like 2) mmio_enabled(), in that the driver -is not supposed to restart normal driver I/O operations right away. -Instead, it should just "probe" the device to check it's recoverability -status. If all is right, then the core will call resume() once all -drivers have ack'd link_reset(). +The next step taken depends on the results returned by the drivers. +If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform +proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). + +If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform +proceeds to STEP 4 (Slot Reset) + +>>> The current powerpc implementation does not implement this callback. + + +STEP 3: Link Reset +------------------ +The platform resets the link, and then calls the link_reset() callback +on all affected device drivers. This is a PCI-Express specific state +and is done whenever a non-fatal error has been detected that can be +"solved" by resetting the link. This call informs the driver of the +reset and the driver should check to see if the device appears to be +in working condition. + +The driver is not supposed to restart normal driver I/O operations +at this point. It should limit itself to "probing" the device to +check it's recoverability status. If all is right, then the platform +will call resume() once all drivers have ack'd link_reset(). Result codes: - (identical to mmio_enabled) + (identical to STEP 3 (MMIO Enabled) ->>> The current ppc64 implementation does not implement this callback. +The platform then proceeds to either STEP 4 (Slot Reset) or STEP 5 +(Resume Operations). - 4) slot_reset() +>>> The current powerpc implementation does not implement this callback. - This is called after the slot has been soft or hard reset by the -platform. A soft reset consists of asserting the adapter #RST line -and then restoring the PCI BARs and PCI configuration header. If the -platform supports PCI hotplug, then it might instead perform a hard -reset by toggling power on the slot off/on. This call gives drivers -the chance to re-initialize the hardware (re-download firmware, etc.), -but drivers shouldn't restart normal I/O processing operations at -this point. (See note about interrupts; interrupts aren't guaranteed -to be delivered until the resume() callback has been called). If all -device drivers report success on this callback, the patform will call -resume() to complete the error handling and let the driver restart -normal I/O processing. + +STEP 4: Slot Reset +------------------ +The platform performs a soft or hard reset of the device, and then +calls the slot_reset() callback. + +A soft reset consists of asserting the adapter #RST line and then +restoring the PCI BAR's and PCI configuration header to a state +that is equivalent to what it would be after a fresh system +power-on followed by power-on BIOS/system firmware initialization. +If the platform supports PCI hotplug, then the reset might be +performed by toggling the slot electrical power off/on. + +It is important for the platform to restore the PCI config space +to the "fresh poweron" state, rather than the "last state". After +a slot reset, the device driver will almost always use its standard +device initialization routines, and an unusual config space setup +may result in hung devices, kernel panics, or silent data corruption. + +This call gives drivers the chance to re-initialize the hardware +(re-download firmware, etc.). At this point, the driver may assume +that he card is in a fresh state and is fully functional. In +particular, interrupt generation should work normally. + +Drivers should not yet restart normal I/O processing operations +at this point. If all device drivers report success on this +callback, the platform will call resume() to complete the sequence, +and let the driver restart normal I/O processing. A driver can still return a critical failure for this function if it can't get the device operational after reset. If the platform -previously tried a soft reset, it migh now try a hard reset (power +previously tried a soft reset, it might now try a hard reset (power cycle) and then call slot_reset() again. It the device still can't be recovered, there is nothing more that can be done; the platform will typically report a "permanent failure" in such a case. The device will be considered "dead" in this case. +Drivers for multi-function cards will need to coordinate among +themselves as to which driver instance will perform any "one-shot" +or global device initialization. For example, the Symbios sym53cxx2 +driver performs device init only from PCI function 0: + ++ if (PCI_FUNC(pdev->devfn) == 0) ++ sym_reset_scsi_bus(np, 0); + Result codes: - - PCIERR_RESULT_DISCONNECT + - PCI_ERS_RESULT_DISCONNECT Same as above. ->>> The current ppc64 implementation does not try a power-cycle reset ->>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should. - - 5) resume() +Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent +Failure). - This is called if all drivers on the segment have returned -PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks. -That basically tells the driver to restart activity, tht everything -is back and running. No result code is taken into account here. If -a new error happens, it will restart a new error handling process. - -That's it. I think this covers all the possibilities. The way those -callbacks are called is platform policy. A platform with no slot reset -capability for example may want to just "ignore" drivers that can't +>>> The current powerpc implementation does not currently try a +>>> power-cycle reset if the driver returned PCI_ERS_RESULT_DISCONNECT. +>>> However, it probably should. + + +STEP 5: Resume Operations +------------------------- +The platform will call the resume() callback on all affected device +drivers if all drivers on the segment have returned +PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks. +The goal of this callback is to tell the driver to restart activity, +that everything is back and running. This callback does not return +a result code. + +At this point, if a new error happens, the platform will restart +a new error recovery sequence. + +STEP 6: Permanent Failure +------------------------- +A "permanent failure" has occurred, and the platform cannot recover +the device. The platform will call error_detected() with a +pci_channel_state value of pci_channel_io_perm_failure. + +The device driver should, at this point, assume the worst. It should +cancel all pending I/O, refuse all new I/O, returning -EIO to +higher layers. The device driver should then clean up all of its +memory and remove itself from kernel operations, much as it would +during system shutdown. + +The platform will typically notify the system operator of the +permanent failure in some way. If the device is hotplug-capable, +the operator will probably want to remove and replace the device. +Note, however, not all failures are truly "permanent". Some are +caused by over-heating, some by a poorly seated card. Many +PCI error events are caused by software bugs, e.g. DMA's to +wild addresses or bogus split transactions due to programming +errors. See the discussion in powerpc/eeh-pci-error-recovery.txt +for additional detail on real-life experience of the causes of +software errors. + + +Conclusion; General Remarks +--------------------------- +The way those callbacks are called is platform policy. A platform with +no slot reset capability may want to just "ignore" drivers that can't recover (disconnect them) and try to let other cards on the same segment recover. Keep in mind that in most real life cases, though, there will be only one driver per segment. -Now, there is a note about interrupts. If you get an interrupt and your +Now, a note about interrupts. If you get an interrupt and your device is dead or has been isolated, there is a problem :) - -After much thinking, I decided to leave that to the platform. That is, -the recovery API only precies that: +The current policy is to turn this into a platform policy. +That is, the recovery API only requires that: - There is no guarantee that interrupt delivery can proceed from any device on the segment starting from the error detection and until the -restart callback is sent, at which point interrupts are expected to be +resume callback is sent, at which point interrupts are expected to be fully operational. - - There is no guarantee that interrupt delivery is stopped, that is, ad -river that gets an interrupts after detecting an error, or that detects -and error within the interrupt handler such that it prevents proper + - There is no guarantee that interrupt delivery is stopped, that is, +a driver that gets an interrupt after detecting an error, or that detects +an error within the interrupt handler such that it prevents proper ack'ing of the interrupt (and thus removal of the source) should just -return IRQ_NOTHANDLED. It's up to the platform to deal with taht -condition, typically by masking the irq source during the duration of +return IRQ_NOTHANDLED. It's up to the platform to deal with that +condition, typically by masking the IRQ source during the duration of the error handling. It is expected that the platform "knows" which interrupts are routed to error-management capable slots and can deal -with temporarily disabling that irq number during error processing (this +with temporarily disabling that IRQ number during error processing (this isn't terribly complex). That means some IRQ latency for other devices sharing the interrupt, but there is simply no other way. High end platforms aren't supposed to share interrupts between many devices anyway :) +>>> Implementation details for the powerpc platform are discussed in +>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt + +>>> As of this writing, there are six device drivers with patches +>>> implementing error recovery. Not all of these patches are in +>>> mainline yet. These may be used as "examples": +>>> +>>> drivers/scsi/ipr.c +>>> drivers/scsi/sym53cxx_2 +>>> drivers/next/e100.c +>>> drivers/net/e1000 +>>> drivers/net/ixgb +>>> drivers/net/s2io.c -Revised: 31 May 2005 Linas Vepstas +The End +------- From benh at kernel.crashing.org Fri Feb 3 12:42:37 2006 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 03 Feb 2006 12:42:37 +1100 Subject: Maple freezing on PCI Target-Abort In-Reply-To: <43E23B4A.4020402@yahoo.fr> References: <43E23B4A.4020402@yahoo.fr> Message-ID: <1138930958.4934.102.camel@localhost.localdomain> > -What exception vector is taking care of a DERR excp? From what I can > see it seems to be the "machine check" vector. But that seems a bit > drastic to me. After all this is just a PCI target abort. I would expect a machine check yes. > -I expect that the normal behavior would be for the kernel to send a > signal termination to the user process which caused the PIO READ PCI > cycle (from a previously mmap()'ed VMA address). Is it doable on this > platform? Since a READ operation is coupled by nature, I think this is > the only acceptable way. It should SIGBUS except if the problem occurred in the kernel. I don't know why it's not doing so, maybe you are hitting an issue/errata or misconfiguration of the 925 ? > I have tried to set the MSR[RI] bit before doing the PCI cycle, but it > didn't change change anything. Also on our design we disconnect the > CPC925 checkstop pin from the 970 machine check pin.(see page 39 of > cpc925 user's manual). So a DERR shouldn't cause a machine check I would > think. > > I realize that these questions are very H/W related but couldn't find > the answer in IBM doc. From benh at kernel.crashing.org Fri Feb 3 12:45:03 2006 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 03 Feb 2006 12:45:03 +1100 Subject: creating PCI-related sysfs entries In-Reply-To: <20060131210805.GA19465@austin.ibm.com> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> Message-ID: <1138931103.4934.105.camel@localhost.localdomain> On Tue, 2006-01-31 at 15:08 -0600, linas wrote: > Hmm. But these slots are not hot-plugabble; should the arch > use the hotplug infrastructure even on those slots? If those are EEH slots, they should probably treated as hotplugable... after all, didn't we discuss back then that one strategy we could use for recovery simulating an unplug/replug to the driver along with a slot hard reset ? From benh at kernel.crashing.org Fri Feb 3 12:56:01 2006 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 03 Feb 2006 12:56:01 +1100 Subject: [PATCH 2.6.16-rc1] Fix booting Maple boards (was: Re: LINUXPPC64 Maple fails to boot current git) In-Reply-To: <20060131151117.GP22672@smtp.west.cox.net> References: <20060130171759.GE22672@smtp.west.cox.net> <1138662630.3417.26.camel@brick.watson.ibm.com> <20060131151117.GP22672@smtp.west.cox.net> Message-ID: <1138931761.4934.113.camel@localhost.localdomain> > When looking for legacy serial ports, condition poking of "ISA" areas > on CONFIG_GENERIC_ISA_DMA, rather than CONFIG_ISA as some boards (such > as the Maple) have no ISA slots, but do have ISA serial ports. Hrm... not sure ISA_DMA has anything to do with that at all.. in fact its more like "has legacy devices". I don't remember adding the ifdef CONFIG_ISA in the first place, maybe I did... it's a bit dodgy I'd say. Indeed, lots of machines have ISA devices (a superIO typically) without having ISA slots... Ben. > Signed-off-by: Tom Rini > > arch/powerpc/kernel/legacy_serial.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kernel/legacy_serial.c b/arch/powerpc/kernel/legacy_serial.c > index f970ace..3dd7b39 100644 > --- a/arch/powerpc/kernel/legacy_serial.c > +++ b/arch/powerpc/kernel/legacy_serial.c > @@ -134,7 +134,7 @@ static int __init add_legacy_soc_port(st > return add_legacy_port(np, -1, UPIO_MEM, addr, addr, NO_IRQ, flags); > } > > -#ifdef CONFIG_ISA > +#ifdef CONFIG_GENERIC_ISA_DMA > static int __init add_legacy_isa_port(struct device_node *np, > struct device_node *isa_brg) > { > @@ -276,7 +276,7 @@ void __init find_legacy_serial_ports(voi > of_node_put(soc); > } > > -#ifdef CONFIG_ISA > +#ifdef CONFIG_GENERIC_ISA_DMA > /* First fill our array with ISA ports */ > for (np = NULL; (np = of_find_node_by_type(np, "serial"));) { > struct device_node *isa = of_get_parent(np); > From benh at kernel.crashing.org Fri Feb 3 12:53:21 2006 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 03 Feb 2006 12:53:21 +1100 Subject: creating PCI-related sysfs entries In-Reply-To: <20060131212624.GA10513@kroah.com> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> <20060131212624.GA10513@kroah.com> Message-ID: <1138931602.4934.110.camel@localhost.localdomain> > That's only because your arch might happen to have 1 device per slot, > which is not true for other arches. And I bet it's also not true for > your non-virtual boxes... Even that is not true since we can have multi-function devices or devices with p2p bridges but the basic entity where the error management infos is available to us is indeed the physical slot. > People have suggested that they create such a driver for a long time. > Why not just do that? Depends if he wants per domain statistics or really per slot ... we do have per-slot control on most of IBM machines, thus I would rather have these info there (though if he also wants consolidated "global" stats, then yes, a host controller driver might be the way to go). From linas at austin.ibm.com Fri Feb 3 13:03:41 2006 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 2 Feb 2006 20:03:41 -0600 Subject: creating PCI-related sysfs entries In-Reply-To: <1138931103.4934.105.camel@localhost.localdomain> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> <1138931103.4934.105.camel@localhost.localdomain> Message-ID: <20060203020341.GR24916@austin.ibm.com> On Fri, Feb 03, 2006 at 12:45:03PM +1100, Benjamin Herrenschmidt was heard to remark: > On Tue, 2006-01-31 at 15:08 -0600, linas wrote: > > > Hmm. But these slots are not hot-plugabble; should the arch > > use the hotplug infrastructure even on those slots? > > If those are EEH slots, they should probably treated as hotplugable... > after all, didn't we discuss back then that one strategy we could use > for recovery simulating an unplug/replug to the driver along with a slot > hard reset ? Yes, and EEH does do that (in mainline, 10K times in a row, last I tried). This email was in reference to the layout of /sys/bus/pci/slots which seems to have only hotplug slots in there; I am not yet sure why. Its possible John Rose can shed some rapid insight? --linas From gregkh at suse.de Fri Feb 3 14:48:41 2006 From: gregkh at suse.de (Greg KH) Date: Thu, 2 Feb 2006 19:48:41 -0800 Subject: [PATCH]: Documentation: Updated PCI Error Recovery In-Reply-To: <20060203000602.GQ24916@austin.ibm.com> References: <20060203000602.GQ24916@austin.ibm.com> Message-ID: <20060203034841.GA14169@suse.de> On Thu, Feb 02, 2006 at 06:06:02PM -0600, Linas Vepstas wrote: > > I'm not sure who I'm addressing this patch to: Linus, maybe? As it's PCI related, I'll take it, like the other PCI stuff, and put it into my trees, which go into -mm, and then into Linus's tree. I'll add this to my queue. thanks, greg k-h From boutcher at cs.umn.edu Fri Feb 3 18:18:50 2006 From: boutcher at cs.umn.edu (Dave C Boutcher) Date: Fri, 3 Feb 2006 01:18:50 -0600 Subject: [PATCH 0/3] powerpc minor fixes to the rtas_percpu_suspend_me routine Message-ID: <17379.986.599275.637898@hound.rchland.ibm.com> A series of small fixes to the rtas_percpu_suspend_me routine for problems discovered since it was pushed to 2.6.16-rc1. Dave Boutcher From boutcher at cs.umn.edu Fri Feb 3 18:18:36 2006 From: boutcher at cs.umn.edu (Dave C Boutcher) Date: Fri, 3 Feb 2006 01:18:36 -0600 Subject: [PATCH 3/3] powerpc remove useless call to touch_softlockup_watchdog Message-ID: <17379.972.53558.75428@hound.rchland.ibm.com> It turns out that we can't stop the watchdog from triggering here. If we touch the timer (which just uses the current jiffie value) before we enable interrupts, it does nothing because jiffies are not mass-updated until after we enable interrupts. If we touch the timer after we enable interrupts, its too late because the softlockup watchdog will already have triggered. The touch_softlockup_watchdog call removed below does nothing. Signed-off-by: Dave Boutcher --- arch/powerpc/kernel/rtas.c | 4 ---- 1 files changed, 0 insertions(+), 4 deletions(-) 14caae1e3b5508ce8798618f9f952f14e7c6d41a diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 4038ac1..1ecfcf8 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -598,10 +598,6 @@ static void rtas_percpu_suspend_me(void } out: - /* before we restore interrupts, make sure we don't - * generate a spurious soft lockup errors - */ - touch_softlockup_watchdog(); local_irq_restore(flags); return; } -- 1.1.4.g7310 From boutcher at cs.umn.edu Fri Feb 3 18:18:39 2006 From: boutcher at cs.umn.edu (Dave C Boutcher) Date: Fri, 3 Feb 2006 01:18:39 -0600 Subject: [PATCH 2/3] powerpc prod all processors after ibm,suspend-me Message-ID: <17379.975.326033.286493@hound.rchland.ibm.com> We need to prod everyone here since this is the only CPU that is guaranteed to be running after the ibm,suspend-me RTAS call returns. Signed-off-by: Dave Boutcher --- arch/powerpc/kernel/rtas.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) 9d615a50c077f82926732c8b9f366bebe50a4660 diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 107bd86..4038ac1 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -565,6 +565,7 @@ static int ibm_suspend_me_token = RTAS_U #ifdef CONFIG_PPC_PSERIES static void rtas_percpu_suspend_me(void *info) { + int i; long rc; long flags; struct rtas_suspend_me_data *data = @@ -589,6 +590,8 @@ static void rtas_percpu_suspend_me(void data->waiting = 0; data->args->args[data->args->nargs] = rtas_call(ibm_suspend_me_token, 0, 1, NULL); + for_each_cpu(i) + plpar_hcall_norets(H_PROD,i); } else { data->waiting = -EBUSY; printk(KERN_ERR "Error on H_Join hypervisor call\n"); -- 1.1.4.g7310 From boutcher at cs.umn.edu Fri Feb 3 18:18:46 2006 From: boutcher at cs.umn.edu (Dave C Boutcher) Date: Fri, 3 Feb 2006 01:18:46 -0600 Subject: [PATCH 1/3] powerpc return correct rtas status from ibm,suspend-me Message-ID: <17379.982.159401.407606@hound.rchland.ibm.com> Correctly return the status from the RTAS call. rtas_call expects to return the status as a return value. Signed-off-by: Dave Boutcher --- arch/powerpc/kernel/rtas.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) a0f3095607ff19d730f2ed5181bd37df231d4015 diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 7fe4a5c..107bd86 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -587,8 +587,8 @@ static void rtas_percpu_suspend_me(void if (rc == H_Continue) { data->waiting = 0; - rtas_call(ibm_suspend_me_token, 0, 1, - data->args->args); + data->args->args[data->args->nargs] = + rtas_call(ibm_suspend_me_token, 0, 1, NULL); } else { data->waiting = -EBUSY; printk(KERN_ERR "Error on H_Join hypervisor call\n"); -- 1.1.4.g7310 From michael at ellerman.id.au Fri Feb 3 19:05:14 2006 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 03 Feb 2006 19:05:14 +1100 Subject: [PATCH] powerpc: Don't start secondary CPUs in a UP && KEXEC kernel Message-ID: <20060203080536.DA5AF68A10@ozlabs.org> Because smp_release_cpus() is built for SMP || KEXEC, it's not safe to unconditionally call it from setup_system(). On a UP && KEXEC kernel we'll start up the secondary CPUs which will then go beserk and we die. Simple fix is to conditionally call smp_release_cpus() in setup_system(). We that in place we don't need the dummy definition of smp_release_cpus() because all call sites are #ifdef'ed either SMP or KEXEC. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/setup_64.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Index: kdump/arch/powerpc/kernel/setup_64.c =================================================================== --- kdump.orig/arch/powerpc/kernel/setup_64.c +++ kdump/arch/powerpc/kernel/setup_64.c @@ -311,8 +311,6 @@ void smp_release_cpus(void) DBG(" <- smp_release_cpus()\n"); } -#else -#define smp_release_cpus() #endif /* CONFIG_SMP || CONFIG_KEXEC */ /* @@ -470,10 +468,12 @@ void __init setup_system(void) check_smt_enabled(); smp_setup_cpu_maps(); +#ifdef CONFIG_SMP /* Release secondary cpus out of their spinloops at 0x60 now that * we can map physical -> logical CPU ids */ smp_release_cpus(); +#endif printk("Starting Linux PPC64 %s\n", system_utsname.version); From michael at ellerman.id.au Fri Feb 3 19:05:47 2006 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 03 Feb 2006 19:05:47 +1100 Subject: [PATCH] powerpc: Don't overwrite flat device tree with kdump kernel Message-ID: <20060203080609.403CA68A1F@ozlabs.org> It's possible for prom_init to allocate the flat device tree inside the kdump crash kernel region. If this happens, when we load the kdump kernel we overwrite the flattened device tree, which is bad. We could make prom_init try and avoid allocating inside the crash kernel region, but then we run into issues if the crash kernel region uses all the space inside the RMO. The easiest solution is to move the flat device tree once we're running in the kernel. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/prom.c | 27 +++++++++++++++++++++++++++ arch/powerpc/kernel/setup_64.c | 3 +++ include/asm-powerpc/prom.h | 2 ++ 3 files changed, 32 insertions(+) Index: kdump/arch/powerpc/kernel/prom.c =================================================================== --- kdump.orig/arch/powerpc/kernel/prom.c +++ kdump/arch/powerpc/kernel/prom.c @@ -1913,3 +1913,30 @@ int prom_update_property(struct device_n return 0; } + +#ifdef CONFIG_KEXEC +/* We may have allocated the flat device tree inside the crash kernel region + * in prom_init. If so we need to move it out into regular memory. */ +void kdump_move_device_tree(void) +{ + unsigned long start, end; + struct boot_param_header *new; + + start = __pa((unsigned long)initial_boot_params); + end = start + initial_boot_params->totalsize; + + if (end < crashk_res.start || start > crashk_res.end) + return; + + new = (struct boot_param_header*) + __va(lmb_alloc(initial_boot_params->totalsize, PAGE_SIZE)); + + memcpy(new, initial_boot_params, initial_boot_params->totalsize); + + initial_boot_params = new; + + DBG("Flat device tree blob moved to %p\n", initial_boot_params); + + /* XXX should we unreserve the old DT? */ +} +#endif /* CONFIG_KEXEC */ Index: kdump/arch/powerpc/kernel/setup_64.c =================================================================== --- kdump.orig/arch/powerpc/kernel/setup_64.c +++ kdump/arch/powerpc/kernel/setup_64.c @@ -398,6 +398,9 @@ void __init setup_system(void) { DBG(" -> setup_system()\n"); +#ifdef CONFIG_KEXEC + kdump_move_device_tree(); +#endif /* * Unflatten the device-tree passed by prom_init or kexec */ Index: kdump/include/asm-powerpc/prom.h =================================================================== --- kdump.orig/include/asm-powerpc/prom.h +++ kdump/include/asm-powerpc/prom.h @@ -222,5 +222,7 @@ extern int of_address_to_resource(struct extern int of_pci_address_to_resource(struct device_node *dev, int bar, struct resource *r); +extern void kdump_move_device_tree(void); + #endif /* __KERNEL__ */ #endif /* _POWERPC_PROM_H */ From benh at kernel.crashing.org Fri Feb 3 20:07:37 2006 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 03 Feb 2006 20:07:37 +1100 Subject: creating PCI-related sysfs entries In-Reply-To: <20060203020341.GR24916@austin.ibm.com> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> <1138931103.4934.105.camel@localhost.localdomain> <20060203020341.GR24916@austin.ibm.com> Message-ID: <1138957657.4934.124.camel@localhost.localdomain> On Thu, 2006-02-02 at 20:03 -0600, Linas Vepstas wrote: > Yes, and EEH does do that (in mainline, 10K times in a row, > last I tried). This email was in reference to the > layout of /sys/bus/pci/slots which seems to have only hotplug > slots in there; I am not yet sure why. Its possible John Rose > can shed some rapid insight? Ok... also, about this "max number of resets" thing, it would be useful in fact to have a rate limit rather ... a network card that for some reason need to be reset about once a day is still fairly useable and it would be nice if the system didn't consider it dead after 10 days ... Also, it might be useful to have an entry to force a retry on a card that has been considered dead... Ben. From galak at kernel.crashing.org Sat Feb 4 01:25:08 2006 From: galak at kernel.crashing.org (Kumar Gala) Date: Fri, 3 Feb 2006 08:25:08 -0600 Subject: [PATCH] powerpc: Don't overwrite flat device tree with kdump kernel In-Reply-To: <20060203080609.403CA68A1F@ozlabs.org> References: <20060203080609.403CA68A1F@ozlabs.org> Message-ID: <8FC7251A-6C37-4B4B-9120-0845616D0E60@kernel.crashing.org> On Feb 3, 2006, at 2:05 AM, Michael Ellerman wrote: > It's possible for prom_init to allocate the flat device tree inside > the > kdump crash kernel region. If this happens, when we load the kdump > kernel we > overwrite the flattened device tree, which is bad. > > We could make prom_init try and avoid allocating inside the crash > kernel > region, but then we run into issues if the crash kernel region uses > all the > space inside the RMO. The easiest solution is to move the flat > device tree > once we're running in the kernel. > > Signed-off-by: Michael Ellerman Doesn't setup_32.c need a similar change? - k > --- > > arch/powerpc/kernel/prom.c | 27 +++++++++++++++++++++++++++ > arch/powerpc/kernel/setup_64.c | 3 +++ > include/asm-powerpc/prom.h | 2 ++ > 3 files changed, 32 insertions(+) > > Index: kdump/arch/powerpc/kernel/prom.c > =================================================================== > --- kdump.orig/arch/powerpc/kernel/prom.c > +++ kdump/arch/powerpc/kernel/prom.c > @@ -1913,3 +1913,30 @@ int prom_update_property(struct device_n > > return 0; > } > + > +#ifdef CONFIG_KEXEC > +/* We may have allocated the flat device tree inside the crash > kernel region > + * in prom_init. If so we need to move it out into regular memory. */ > +void kdump_move_device_tree(void) > +{ > + unsigned long start, end; > + struct boot_param_header *new; > + > + start = __pa((unsigned long)initial_boot_params); > + end = start + initial_boot_params->totalsize; > + > + if (end < crashk_res.start || start > crashk_res.end) > + return; > + > + new = (struct boot_param_header*) > + __va(lmb_alloc(initial_boot_params->totalsize, PAGE_SIZE)); > + > + memcpy(new, initial_boot_params, initial_boot_params->totalsize); > + > + initial_boot_params = new; > + > + DBG("Flat device tree blob moved to %p\n", initial_boot_params); > + > + /* XXX should we unreserve the old DT? */ > +} > +#endif /* CONFIG_KEXEC */ > Index: kdump/arch/powerpc/kernel/setup_64.c > =================================================================== > --- kdump.orig/arch/powerpc/kernel/setup_64.c > +++ kdump/arch/powerpc/kernel/setup_64.c > @@ -398,6 +398,9 @@ void __init setup_system(void) > { > DBG(" -> setup_system()\n"); > > +#ifdef CONFIG_KEXEC > + kdump_move_device_tree(); > +#endif > /* > * Unflatten the device-tree passed by prom_init or kexec > */ > Index: kdump/include/asm-powerpc/prom.h > =================================================================== > --- kdump.orig/include/asm-powerpc/prom.h > +++ kdump/include/asm-powerpc/prom.h > @@ -222,5 +222,7 @@ extern int of_address_to_resource(struct > extern int of_pci_address_to_resource(struct device_node *dev, int > bar, > struct resource *r); > > +extern void kdump_move_device_tree(void); > + > #endif /* __KERNEL__ */ > #endif /* _POWERPC_PROM_H */ > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc64-dev From trini at kernel.crashing.org Sat Feb 4 01:47:13 2006 From: trini at kernel.crashing.org (Tom Rini) Date: Fri, 3 Feb 2006 07:47:13 -0700 Subject: LINUXPPC64 Maple fails to boot current git) In-Reply-To: <1138931761.4934.113.camel@localhost.localdomain> References: <20060130171759.GE22672@smtp.west.cox.net> <1138662630.3417.26.camel@brick.watson.ibm.com> <20060131151117.GP22672@smtp.west.cox.net> <1138931761.4934.113.camel@localhost.localdomain> Message-ID: <20060203144713.GE3800@smtp.west.cox.net> On Fri, Feb 03, 2006 at 12:56:01PM +1100, Benjamin Herrenschmidt wrote: > > > When looking for legacy serial ports, condition poking of "ISA" areas > > on CONFIG_GENERIC_ISA_DMA, rather than CONFIG_ISA as some boards (such > > as the Maple) have no ISA slots, but do have ISA serial ports. > > Hrm... not sure ISA_DMA has anything to do with that at all.. in fact > its more like "has legacy devices". I don't remember adding the ifdef > CONFIG_ISA in the first place, maybe I did... it's a bit dodgy I'd say. > Indeed, lots of machines have ISA devices (a superIO typically) without > having ISA slots... Olaf says that he sent a patch to Andrew, who should be passing it along if not already, to just remove the #ifdefs there. -- Tom Rini http://gate.crashing.org/~trini/ From ericvh at gmail.com Sat Feb 4 01:54:41 2006 From: ericvh at gmail.com (Eric Van Hensbergen) Date: Fri, 3 Feb 2006 08:54:41 -0600 (CST) Subject: [patch 0/3] systemsim patch cleanup Message-ID: <20060203145441.6EC0A5A8075@localhost.localdomain> These are a set of code cleanups based on Arnd's systemsim patch-set sent out on January 14th. This patch attempts to clean-up some of the issues with the bogus network and bogus disk facilities of systemsim -- but is largely cosmetic. We had looked at incorporating the bogus devices into the IBM-maintained virtualization drivers in the past, but at the time it didn't look like there was a good match in the veth or the vscsi code -- the call-thru's would not integrate as nicely as they did with the hvc console code. The bogus disk and bogus network drivers are largely a stop-gap measure for systems the simulator doesn't have complete device models for. More complete device models are already in the plans for systemsim-cell, which will likely eventually replace the need for the "bogus" drivers. As such, I'll maintain the existing bogus drivers out-of-tree in my git repository on kernel.org (/pub/scm/linux/kernel/git/ericvh/systemsim.git) Unless there are any objections, I'll continue cc:'ing the ppc64-dev list on modifications to the patches. -eric From ericvh at gmail.com Sat Feb 4 01:56:17 2006 From: ericvh at gmail.com (Eric Van Hensbergen) Date: Fri, 3 Feb 2006 08:56:17 -0600 (CST) Subject: [patch 3/3] systemsim: new systemsim default configuration Message-ID: <20060203145617.D6FCD5A809C@localhost.localdomain> Subject: [PATCH] systemsim: clean up default configuration Signed-off-by: Eric Van Hensbergen --- arch/powerpc/configs/systemsim_defconfig | 125 +++++++----------------------- 1 files changed, 28 insertions(+), 97 deletions(-) 72e13e73b5998b853a9bd20e8c425486818ed09a diff --git a/arch/powerpc/configs/systemsim_defconfig b/arch/powerpc/configs/systemsim_defconfig index 59f1d0f..f7daa08 100644 --- a/arch/powerpc/configs/systemsim_defconfig +++ b/arch/powerpc/configs/systemsim_defconfig @@ -1,7 +1,7 @@ # # Automatically generated make config: don't edit -# Linux kernel version: -# Fri Jan 13 09:33:18 2006 +# Linux kernel version: 2.6.16-rc1 +# Thu Feb 2 15:18:13 2006 # CONFIG_PPC64=y CONFIG_64BIT=y @@ -18,7 +18,6 @@ CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_PPC_OF=y CONFIG_PPC_UDBG_16550=y -# CONFIG_CRASH_DUMP is not set CONFIG_GENERIC_TBSYNC=y # @@ -57,7 +56,7 @@ CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_CPUSETS is not set CONFIG_INITRAMFS_SOURCE="" -CONFIG_CC_OPTIMIZE_FOR_SIZE=y +# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set @@ -100,11 +99,11 @@ CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y -# CONFIG_DEFAULT_AS is not set +CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_DEADLINE is not set # CONFIG_DEFAULT_CFQ is not set -CONFIG_DEFAULT_NOOP=y -CONFIG_DEFAULT_IOSCHED="noop" +# CONFIG_DEFAULT_NOOP is not set +CONFIG_DEFAULT_IOSCHED="anticipatory" # # Platform support @@ -116,7 +115,7 @@ CONFIG_PPC_MULTIPLATFORM=y CONFIG_PPC_PSERIES=y # CONFIG_PPC_PMAC is not set CONFIG_PPC_MAPLE=y -CONFIG_PPC_CELL=y +# CONFIG_PPC_CELL is not set CONFIG_PPC_SYSTEMSIM=y CONFIG_SYSTEMSIM_IDLE=y CONFIG_XICS=y @@ -126,9 +125,8 @@ CONFIG_PPC_RTAS=y CONFIG_RTAS_ERROR_LOGGING=y CONFIG_RTAS_PROC=y # CONFIG_RTAS_FLASH is not set -CONFIG_MMIO_NVRAM=y +# CONFIG_MMIO_NVRAM is not set CONFIG_MPIC_BROKEN_U3=y -CONFIG_CELL_IIC=y CONFIG_IBMVIO=y # CONFIG_IBMEBUS is not set # CONFIG_PPC_MPC106 is not set @@ -136,11 +134,6 @@ CONFIG_IBMVIO=y # CONFIG_WANT_EARLY_SERIAL is not set # -# Cell Broadband Engine options -# -CONFIG_SPU_FS=m - -# # Kernel options # # CONFIG_HZ_100 is not set @@ -157,6 +150,7 @@ CONFIG_FORCE_MAX_ZONEORDER=13 # CONFIG_IOMMU_VMERGE is not set # CONFIG_HOTPLUG_CPU is not set # CONFIG_KEXEC is not set +# CONFIG_CRASH_DUMP is not set # CONFIG_IRQ_ALL_CPUS is not set # CONFIG_PPC_SPLPAR is not set CONFIG_EEH=y @@ -299,6 +293,7 @@ CONFIG_BRIDGE_NETFILTER=y # Core Netfilter Configuration # # CONFIG_NETFILTER_NETLINK is not set +# CONFIG_NETFILTER_XTABLES is not set # # IP: Netfilter Configuration @@ -315,91 +310,11 @@ CONFIG_IP_NF_TFTP=m CONFIG_IP_NF_AMANDA=m # CONFIG_IP_NF_PPTP is not set CONFIG_IP_NF_QUEUE=m -CONFIG_IP_NF_IPTABLES=m -CONFIG_IP_NF_MATCH_LIMIT=m -# CONFIG_IP_NF_MATCH_IPRANGE is not set -CONFIG_IP_NF_MATCH_MAC=m -CONFIG_IP_NF_MATCH_PKTTYPE=m -CONFIG_IP_NF_MATCH_MARK=m -CONFIG_IP_NF_MATCH_MULTIPORT=m -CONFIG_IP_NF_MATCH_TOS=m -CONFIG_IP_NF_MATCH_RECENT=m -CONFIG_IP_NF_MATCH_ECN=m -CONFIG_IP_NF_MATCH_DSCP=m -CONFIG_IP_NF_MATCH_AH_ESP=m -CONFIG_IP_NF_MATCH_LENGTH=m -CONFIG_IP_NF_MATCH_TTL=m -CONFIG_IP_NF_MATCH_TCPMSS=m -CONFIG_IP_NF_MATCH_HELPER=m -CONFIG_IP_NF_MATCH_STATE=m -CONFIG_IP_NF_MATCH_CONNTRACK=m -CONFIG_IP_NF_MATCH_OWNER=m -# CONFIG_IP_NF_MATCH_PHYSDEV is not set -# CONFIG_IP_NF_MATCH_ADDRTYPE is not set -# CONFIG_IP_NF_MATCH_REALM is not set -# CONFIG_IP_NF_MATCH_SCTP is not set -# CONFIG_IP_NF_MATCH_DCCP is not set -# CONFIG_IP_NF_MATCH_COMMENT is not set -# CONFIG_IP_NF_MATCH_HASHLIMIT is not set -# CONFIG_IP_NF_MATCH_STRING is not set -# CONFIG_IP_NF_MATCH_POLICY is not set -CONFIG_IP_NF_FILTER=m -CONFIG_IP_NF_TARGET_REJECT=m -CONFIG_IP_NF_TARGET_LOG=m -CONFIG_IP_NF_TARGET_ULOG=m -CONFIG_IP_NF_TARGET_TCPMSS=m -# CONFIG_IP_NF_TARGET_NFQUEUE is not set -CONFIG_IP_NF_NAT=m -CONFIG_IP_NF_NAT_NEEDED=y -CONFIG_IP_NF_TARGET_MASQUERADE=m -CONFIG_IP_NF_TARGET_REDIRECT=m -# CONFIG_IP_NF_TARGET_NETMAP is not set -# CONFIG_IP_NF_TARGET_SAME is not set -CONFIG_IP_NF_NAT_SNMP_BASIC=m -CONFIG_IP_NF_NAT_IRC=m -CONFIG_IP_NF_NAT_FTP=m -CONFIG_IP_NF_NAT_TFTP=m -CONFIG_IP_NF_NAT_AMANDA=m -CONFIG_IP_NF_MANGLE=m -CONFIG_IP_NF_TARGET_TOS=m -CONFIG_IP_NF_TARGET_ECN=m -CONFIG_IP_NF_TARGET_DSCP=m -CONFIG_IP_NF_TARGET_MARK=m -# CONFIG_IP_NF_TARGET_CLASSIFY is not set -# CONFIG_IP_NF_TARGET_TTL is not set -# CONFIG_IP_NF_RAW is not set -CONFIG_IP_NF_ARPTABLES=m -CONFIG_IP_NF_ARPFILTER=m -CONFIG_IP_NF_ARP_MANGLE=m # # IPv6: Netfilter Configuration (EXPERIMENTAL) # # CONFIG_IP6_NF_QUEUE is not set -CONFIG_IP6_NF_IPTABLES=m -CONFIG_IP6_NF_MATCH_LIMIT=m -CONFIG_IP6_NF_MATCH_MAC=m -CONFIG_IP6_NF_MATCH_RT=m -CONFIG_IP6_NF_MATCH_OPTS=m -CONFIG_IP6_NF_MATCH_FRAG=m -CONFIG_IP6_NF_MATCH_HL=m -CONFIG_IP6_NF_MATCH_MULTIPORT=m -CONFIG_IP6_NF_MATCH_OWNER=m -CONFIG_IP6_NF_MATCH_MARK=m -CONFIG_IP6_NF_MATCH_IPV6HEADER=m -CONFIG_IP6_NF_MATCH_AHESP=m -CONFIG_IP6_NF_MATCH_LENGTH=m -CONFIG_IP6_NF_MATCH_EUI64=m -# CONFIG_IP6_NF_MATCH_PHYSDEV is not set -# CONFIG_IP6_NF_MATCH_POLICY is not set -CONFIG_IP6_NF_FILTER=m -CONFIG_IP6_NF_TARGET_LOG=m -# CONFIG_IP6_NF_TARGET_REJECT is not set -# CONFIG_IP6_NF_TARGET_NFQUEUE is not set -CONFIG_IP6_NF_MANGLE=m -CONFIG_IP6_NF_TARGET_MARK=m -# CONFIG_IP6_NF_TARGET_HL is not set -# CONFIG_IP6_NF_RAW is not set # # DECnet: Netfilter Configuration @@ -443,6 +358,11 @@ CONFIG_IPDDP_ENCAP=y CONFIG_IPDDP_DECAP=y # CONFIG_X25 is not set # CONFIG_LAPB is not set + +# +# TIPC Configuration (EXPERIMENTAL) +# +# CONFIG_TIPC is not set CONFIG_NET_DIVERT=y # CONFIG_ECONET is not set CONFIG_WAN_ROUTER=m @@ -555,6 +475,7 @@ CONFIG_MTD_CFI_I2=y # CONFIG_MTD_RAM is not set # CONFIG_MTD_ROM is not set # CONFIG_MTD_ABSENT is not set +# CONFIG_MTD_OBSOLETE_CHIPS is not set # # Mapping drivers for chip access @@ -707,7 +628,6 @@ CONFIG_SYSTEMSIM_NET=y # CONFIG_SK98LIN is not set # CONFIG_TIGON3 is not set # CONFIG_BNX2 is not set -# CONFIG_SPIDER_NET is not set # CONFIG_MV643XX_ETH is not set # @@ -815,7 +735,7 @@ CONFIG_HW_CONSOLE=y CONFIG_SERIAL_8250=y # CONFIG_SERIAL_8250_CONSOLE is not set CONFIG_SERIAL_8250_NR_UARTS=4 -CONFIG_SERIAL_8250_RUNTIME_UARTS=2 +CONFIG_SERIAL_8250_RUNTIME_UARTS=4 # CONFIG_SERIAL_8250_EXTENDED is not set # @@ -826,7 +746,10 @@ CONFIG_SERIAL_CORE=y CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 +CONFIG_HVC_DRIVER=y # CONFIG_HVC_CONSOLE is not set +CONFIG_HVC_FSS=y +CONFIG_HVC_RTAS=y # CONFIG_HVCS is not set # @@ -864,6 +787,12 @@ CONFIG_LEGACY_PTY_COUNT=256 # CONFIG_I2C is not set # +# SPI support +# +# CONFIG_SPI is not set +# CONFIG_SPI_MASTER is not set + +# # Dallas's 1-wire bus # # CONFIG_W1 is not set @@ -1057,6 +986,7 @@ CONFIG_UNIXWARE_DISKLABEL=y CONFIG_SGI_PARTITION=y # CONFIG_ULTRIX_PARTITION is not set CONFIG_SUN_PARTITION=y +# CONFIG_KARMA_PARTITION is not set # CONFIG_EFI_PARTITION is not set # @@ -1137,6 +1067,7 @@ CONFIG_DEBUG_SPINLOCK_SLEEP=y # CONFIG_DEBUG_INFO is not set # CONFIG_DEBUG_FS is not set # CONFIG_DEBUG_VM is not set +CONFIG_FORCED_INLINING=y # CONFIG_RCU_TORTURE_TEST is not set # CONFIG_DEBUG_STACKOVERFLOW is not set # CONFIG_DEBUG_STACK_USAGE is not set -- 1.0.GIT From ericvh at gmail.com Sat Feb 4 01:55:06 2006 From: ericvh at gmail.com (Eric Van Hensbergen) Date: Fri, 3 Feb 2006 08:55:06 -0600 (CST) Subject: [patch 1/3] systemsim: cleanup systemsim network patch Message-ID: <20060203145506.1E0405A807B@localhost.localdomain> Subject: [PATCH] systemsim: clean-up systemsim network patch Incorporate some of the LKML feedback, clean-up naming conventions and fix a bogus free in the close routine. Signed-off-by: Eric Van Hensbergen --- drivers/net/systemsim_net.c | 113 ++++++++++++++++++++++--------------------- 1 files changed, 57 insertions(+), 56 deletions(-) 79e30c5718a29c6de20e45f00bc1b458b359c29c diff --git a/drivers/net/systemsim_net.c b/drivers/net/systemsim_net.c index babc1fb..0a4cea9 100644 --- a/drivers/net/systemsim_net.c +++ b/drivers/net/systemsim_net.c @@ -60,32 +60,32 @@ #include #include -#define MAMBO_BOGUS_NET_PROBE 119 -#define MAMBO_BOGUS_NET_SEND 120 -#define MAMBO_BOGUS_NET_RECV 121 +#define SYSTEMSIM_NET_PROBE 119 +#define SYSTEMSIM_NET_SEND 120 +#define SYSTEMSIM_NET_RECV 121 -static inline int MamboBogusNetProbe(int devno, void *buf) +static inline int systemsim_bogusnet_probe(int devno, void *buf) { - return callthru2(MAMBO_BOGUS_NET_PROBE, + return callthru2(SYSTEMSIM_NET_PROBE, (unsigned long)devno, (unsigned long)buf); } -static inline int MamboBogusNetSend(int devno, void *buf, ulong size) +static inline int systemsim_bogusnet_send(int devno, void *buf, ulong size) { - return callthru3(MAMBO_BOGUS_NET_SEND, + return callthru3(SYSTEMSIM_NET_SEND, (unsigned long)devno, (unsigned long)buf, (unsigned long)size); } -static inline int MamboBogusNetRecv(int devno, void *buf, ulong size) +static inline int systemsim_bogusnet_recv(int devno, void *buf, ulong size) { - return callthru3(MAMBO_BOGUS_NET_RECV, + return callthru3(SYSTEMSIM_NET_RECV, (unsigned long)devno, (unsigned long)buf, (unsigned long)size); } static irqreturn_t -mambonet_interrupt(int irq, void *dev_instance, struct pt_regs *regs); +systemsim_net_intr(int irq, void *dev_instance, struct pt_regs *regs); #define INIT_BOTTOM_HALF(x,y,z) INIT_WORK(x, y, (void*)z) #define SCHEDULE_BOTTOM_HALF(x) schedule_delayed_work(x, 1) @@ -100,18 +100,18 @@ struct netdev_private { struct net_device_stats stats; }; -static int mambonet_probedev(int devno, void *buf) +static int systemsim_net_probedev(int devno, void *buf) { - struct device_node *mambo; + struct device_node *systemsim; struct device_node *net; unsigned int *reg; - mambo = find_path_device("/mambo"); + systemsim = find_path_device("/systemsim"); - if (mambo == NULL) { + if (systemsim == NULL) { return -1; } - net = find_path_device("/mambo/bogus-net at 0"); + net = find_path_device("/systemsim/bogus-net at 0"); if (net == NULL) { return -1; } @@ -121,20 +121,20 @@ static int mambonet_probedev(int devno, return -1; } - return MamboBogusNetProbe(devno, buf); + return systemsim_bogusnet_probe(devno, buf); } -static int mambonet_send(int devno, void *buf, ulong size) +static int systemsim_net_send(int devno, void *buf, ulong size) { - return MamboBogusNetSend(devno, buf, size); + return systemsim_bogusnet_send(devno, buf, size); } -static int mambonet_recv(int devno, void *buf, ulong size) +static int systemsim_net_recv(int devno, void *buf, ulong size) { - return MamboBogusNetRecv(devno, buf, size); + return systemsim_bogusnet_recv(devno, buf, size); } -static int mambonet_start_xmit(struct sk_buff *skb, struct net_device *dev) +static int systemsim_net_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct netdev_private *priv = (struct netdev_private *)dev->priv; int devno = priv->devno; @@ -142,7 +142,7 @@ static int mambonet_start_xmit(struct sk skb->dev = dev; /* we might need to checksum or something */ - mambonet_send(devno, skb->data, skb->len); + systemsim_net_send(devno, skb->data, skb->len); dev->last_rx = jiffies; priv->stats.rx_bytes += skb->len; @@ -155,7 +155,7 @@ static int mambonet_start_xmit(struct sk return (0); } -static int mambonet_poll(struct net_device *dev, int *budget) +static int systemsim_net_poll(struct net_device *dev, int *budget) { struct netdev_private *np = dev->priv; int devno = np->devno; @@ -166,7 +166,7 @@ static int mambonet_poll(struct net_devi int max_frames = min(*budget, dev->quota); int ret = 0; - while ((ns = mambonet_recv(devno, buffer, 1600)) > 0) { + while ((ns = systemsim_net_recv(devno, buffer, 1600)) > 0) { if ((skb = dev_alloc_skb(ns + 2)) != NULL) { skb->dev = dev; skb_reserve(skb, 2); /* 16 byte align the IP @@ -209,12 +209,12 @@ static int mambonet_poll(struct net_devi return ret; } -static void mambonet_timer(struct net_device *dev) +static void systemsim_net_timer(struct net_device *dev) { int budget = 16; struct netdev_private *priv = (struct netdev_private *)dev->priv; - mambonet_poll(dev, &budget); + systemsim_net_poll(dev, &budget); if (!priv->closing) { SCHEDULE_BOTTOM_HALF(&priv->poll_task); @@ -228,7 +228,7 @@ static struct net_device_stats *get_stat } static irqreturn_t -mambonet_interrupt(int irq, void *dev_instance, struct pt_regs *regs) +systemsim_net_intr(int irq, void *dev_instance, struct pt_regs *regs) { struct net_device *dev = dev_instance; if (netif_rx_schedule_prep(dev)) { @@ -237,7 +237,7 @@ mambonet_interrupt(int irq, void *dev_in return IRQ_HANDLED; } -static int mambonet_open(struct net_device *dev) +static int systemsim_net_open(struct net_device *dev) { struct netdev_private *priv; int ret = 0; @@ -245,29 +245,30 @@ static int mambonet_open(struct net_devi priv = dev->priv; /* - * we can't start polling in mambonet_init, because I don't think + * we can't start polling in systemsim_net_init, because I don't think * workqueues are usable that early. so start polling now. */ if (dev->irq) { - ret = request_irq(dev->irq, &mambonet_interrupt, 0, + ret = request_irq(dev->irq, &systemsim_net_intr, 0, dev->name, dev); if (ret == 0) { netif_start_queue(dev); } else { - printk(KERN_ERR "mambonet: request irq failed\n"); + printk(KERN_ERR "systemsim net: request irq failed\n"); } - MamboBogusNetProbe(priv->devno, NULL); /* probe with NULL to activate interrupts */ + /* probe with NULL to activate interrupts */ + systemsim_bogusnet_probe(priv->devno, NULL); } else { - mambonet_timer(dev); + systemsim_net_timer(dev); } return ret; } -static int mambonet_close(struct net_device *dev) +static int systemsim_net_close(struct net_device *dev) { struct netdev_private *priv; @@ -282,30 +283,29 @@ static int mambonet_close(struct net_dev KILL_BOTTOM_HALF(&priv->poll_task); } - kfree(priv); - return 0; } -static struct net_device_stats mambonet_stats; +static struct net_device_stats systemsim_net_stats; -static struct net_device_stats *mambonet_get_stats(struct net_device *dev) +static struct net_device_stats *systemsim_net_get_stats(struct net_device *dev) { - return &mambonet_stats; + return &systemsim_net_stats; } -static int mambonet_set_mac_address(struct net_device *dev, void *p) +static int systemsim_net_set_mac_address(struct net_device *dev, void *p) { return -EOPNOTSUPP; } -static int mambonet_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) +static int systemsim_net_ioctl(struct net_device *dev, struct ifreq *ifr, + int cmd) { return -EOPNOTSUPP; } static int nextdevno = 0; /* running count of device numbers */ /* Initialize the rest of the device. */ -int __init do_mambonet_probe(struct net_device *dev) +int __init do_systemsim_net_probe(struct net_device *dev) { struct netdev_private *priv; int devno = nextdevno++; @@ -313,7 +313,7 @@ int __init do_mambonet_probe(struct net_ printk("eth%d: bogus network driver initialization\n", devno); - irq = mambonet_probedev(devno, dev->dev_addr); + irq = systemsim_net_probedev(devno, dev->dev_addr); if (irq < 0) { printk("No IRQ retreived\n"); @@ -328,14 +328,14 @@ int __init do_mambonet_probe(struct net_ dev->irq = irq; dev->mtu = MAMBO_MTU; - dev->open = mambonet_open; - dev->poll = mambonet_poll; + dev->open = systemsim_net_open; + dev->poll = systemsim_net_poll; dev->weight = 16; - dev->stop = mambonet_close; - dev->hard_start_xmit = mambonet_start_xmit; - dev->get_stats = mambonet_get_stats; - dev->set_mac_address = mambonet_set_mac_address; - dev->do_ioctl = mambonet_ioctl; + dev->stop = systemsim_net_close; + dev->hard_start_xmit = systemsim_net_start_xmit; + dev->get_stats = systemsim_net_get_stats; + dev->set_mac_address = systemsim_net_set_mac_address; + dev->do_ioctl = systemsim_net_ioctl; dev->priv = kmalloc(sizeof(struct netdev_private), GFP_KERNEL); if (dev->priv == NULL) @@ -348,14 +348,14 @@ int __init do_mambonet_probe(struct net_ dev->get_stats = get_stats; if (dev->irq == 0) { - INIT_BOTTOM_HALF(&priv->poll_task, (void *)mambonet_timer, + INIT_BOTTOM_HALF(&priv->poll_task, (void *)systemsim_net_timer, (void *)dev); } return (0); }; -struct net_device *__init mambonet_probe(int unit) +struct net_device *__init systemsim_net_probe(int unit) { struct net_device *dev = alloc_etherdev(0); int err; @@ -366,7 +366,7 @@ struct net_device *__init mambonet_probe sprintf(dev->name, "eth%d", unit); netdev_boot_setup_check(dev); - err = do_mambonet_probe(dev); + err = do_systemsim_net_probe(dev); if (err) goto out; @@ -382,11 +382,12 @@ struct net_device *__init mambonet_probe return ERR_PTR(err); } -int __init init_mambonet(void) +int __init init_systemsim_net(void) { - mambonet_probe(0); + systemsim_net_probe(0); return 0; } -module_init(init_mambonet); +module_init(init_systemsim_net); +MODULE_DESCRIPTION("Systemsim Network Driver"); MODULE_LICENSE("GPL"); -- 1.0.GIT From ericvh at gmail.com Sat Feb 4 01:55:36 2006 From: ericvh at gmail.com (Eric Van Hensbergen) Date: Fri, 3 Feb 2006 08:55:36 -0600 (CST) Subject: [patch 2/3] systemsim: cleanup systemsim block driver patch Message-ID: <20060203145536.CB9C35A8098@localhost.localdomain> Subject: [PATCH] systemsim: clean up systemsim block driver Clean-up the systemsim block driver and integrate some of the suggestions from LKML. Signed-off-by: Eric Van Hensbergen --- drivers/block/systemsim_bd.c | 159 ++++++++++++++++++++++++------------------ 1 files changed, 91 insertions(+), 68 deletions(-) ea40711c3a573b917cade94c1bdca659e4f3f905 diff --git a/drivers/block/systemsim_bd.c b/drivers/block/systemsim_bd.c index deecfb8..bec453e 100644 --- a/drivers/block/systemsim_bd.c +++ b/drivers/block/systemsim_bd.c @@ -11,7 +11,7 @@ * written by Pavel Machek and Steven Whitehouse * * Some code is from the IBM Full System Simulator Group in ARL - * Author: PAtrick Bohrer + * Author: Patrick Bohrer * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -43,7 +43,7 @@ #include #include #include - +#include #include #include @@ -52,21 +52,21 @@ #include #define MAJOR_NR 42 -#define MAX_MBD 128 +#define MAX_SYSTEMSIM_BD 128 -#define MBD_SET_BLKSIZE _IO( 0xab, 1 ) -#define MBD_SET_SIZE _IO( 0xab, 2 ) -#define MBD_SET_SIZE_BLOCKS _IO( 0xab, 7 ) -#define MBD_DISCONNECT _IO( 0xab, 8 ) +#define SYSTEMSIM_BD_SET_BLKSIZE _IO( 0xab, 1 ) +#define SYSTEMSIM_BD_SET_SIZE _IO( 0xab, 2 ) +#define SYSTEMSIM_BD_SET_SIZE_BLOCKS _IO( 0xab, 7 ) +#define SYSTEMSIM_BD_DISCONNECT _IO( 0xab, 8 ) -struct mbd_device { +struct systemsim_bd_device { int initialized; int refcnt; int flags; struct gendisk *disk; }; -static struct mbd_device mbd_dev[MAX_MBD]; +static struct systemsim_bd_device systemsim_bd_dev[MAX_SYSTEMSIM_BD]; #define BD_INFO_SYNC 0 #define BD_INFO_STATUS 1 @@ -79,7 +79,7 @@ static struct mbd_device mbd_dev[MAX_MBD #define BOGUS_DISK_INFO 118 static inline int -MamboBogusDiskRead(int devno, void *buf, ulong sect, ulong nrsect) +systemsim_disk_read(int devno, void *buf, ulong sect, ulong nrsect) { return callthru3(BOGUS_DISK_READ, (unsigned long)buf, (unsigned long)sect, @@ -87,34 +87,34 @@ MamboBogusDiskRead(int devno, void *buf, } static inline int -MamboBogusDiskWrite(int devno, void *buf, ulong sect, ulong nrsect) +systemsim_disk_write(int devno, void *buf, ulong sect, ulong nrsect) { return callthru3(BOGUS_DISK_WRITE, (unsigned long)buf, (unsigned long)sect, (unsigned long)((nrsect << 16) | devno)); } -static inline int MamboBogusDiskInfo(int op, int devno) +static inline int systemsim_disk_info(int op, int devno) { return callthru2(BOGUS_DISK_INFO, (unsigned long)op, (unsigned long)devno); } -static int mbd_init_disk(int devno) +static int systemsim_bd_init_disk(int devno) { - struct gendisk *disk = mbd_dev[devno].disk; + struct gendisk *disk = systemsim_bd_dev[devno].disk; unsigned int sz; /* check disk configured */ - if (!MamboBogusDiskInfo(BD_INFO_STATUS, devno)) { + if (!systemsim_disk_info(BD_INFO_STATUS, devno)) { printk(KERN_ERR "Attempting to open bogus disk before initializaiton\n"); return 0; } - mbd_dev[devno].initialized++; + systemsim_bd_dev[devno].initialized++; - sz = MamboBogusDiskInfo(BD_INFO_DEVSZ, devno); + sz = systemsim_disk_info(BD_INFO_DEVSZ, devno); printk("Initializing disk %d with devsz %u\n", devno, sz); @@ -123,7 +123,7 @@ static int mbd_init_disk(int devno) return 1; } -static void do_mbd_request(request_queue_t * q) +static void do_systemsim_bd_request(request_queue_t * q) { int result = 0; struct request *req; @@ -133,14 +133,14 @@ static void do_mbd_request(request_queue switch (rq_data_dir(req)) { case READ: - result = MamboBogusDiskRead(minor, - req->buffer, req->sector, - req->current_nr_sectors); - break; - case WRITE: - result = MamboBogusDiskWrite(minor, + result = systemsim_disk_read(minor, req->buffer, req->sector, req->current_nr_sectors); + break; + case WRITE: + result = systemsim_disk_write(minor, + req->buffer, req->sector, + req->current_nr_sectors); }; if (result) @@ -150,108 +150,131 @@ static void do_mbd_request(request_queue } } -static int mbd_release(struct inode *inode, struct file *file) +static int systemsim_bd_release(struct inode *inode, struct file *file) { - struct mbd_device *lo; + struct systemsim_bd_device *lo; int dev; if (!inode) return -ENODEV; dev = inode->i_bdev->bd_disk->first_minor; - if (dev >= MAX_MBD) + if (dev >= MAX_SYSTEMSIM_BD) return -ENODEV; - if (MamboBogusDiskInfo(BD_INFO_SYNC, dev) < 0) { - printk(KERN_ALERT "mbd_release: unable to sync\n"); + if (systemsim_disk_info(BD_INFO_SYNC, dev) < 0) { + printk(KERN_ALERT "systemsim_bd_release: unable to sync\n"); } - lo = &mbd_dev[dev]; + lo = &systemsim_bd_dev[dev]; if (lo->refcnt <= 0) - printk(KERN_ALERT "mbd_release: refcount(%d) <= 0\n", + printk(KERN_ALERT "systemsim_bd_release: refcount(%d) <= 0\n", lo->refcnt); lo->refcnt--; return 0; } -static int mbd_revalidate(struct gendisk *disk) +static int systemsim_bd_revalidate(struct gendisk *disk) { int devno = disk->first_minor; - mbd_init_disk(devno); + systemsim_bd_init_disk(devno); return 0; } -static int mbd_open(struct inode *inode, struct file *file) +static int systemsim_bd_open(struct inode *inode, struct file *file) { int dev; if (!inode) return -EINVAL; dev = inode->i_bdev->bd_disk->first_minor; - if (dev >= MAX_MBD) + if (dev >= MAX_SYSTEMSIM_BD) return -ENODEV; check_disk_change(inode->i_bdev); - if (!mbd_dev[dev].initialized) - if (!mbd_init_disk(dev)) + if (!systemsim_bd_dev[dev].initialized) + if (!systemsim_bd_init_disk(dev)) return -ENODEV; - mbd_dev[dev].refcnt++; + systemsim_bd_dev[dev].refcnt++; return 0; } -static struct block_device_operations mbd_fops = { +static struct block_device_operations systemsim_bd_fops = { owner:THIS_MODULE, - open:mbd_open, - release:mbd_release, - /* media_changed: mbd_check_change, */ - revalidate_disk:mbd_revalidate, + open:systemsim_bd_open, + release:systemsim_bd_release, + /* media_changed: systemsim_bd_check_change, */ + revalidate_disk:systemsim_bd_revalidate, }; -static spinlock_t mbd_lock = SPIN_LOCK_UNLOCKED; +static spinlock_t systemsim_bd_lock = SPIN_LOCK_UNLOCKED; -static int __init mbd_init(void) +static int __init systemsim_bd_init(void) { + struct device_node *systemsim; int err = -ENOMEM; int i; - for (i = 0; i < MAX_MBD; i++) { + systemsim = find_path_device("/systemsim"); + + if (systemsim == NULL) { + printk("NO SYSTEMSIM BOGUS DISK DETECTED\n"); + return -1; + } + + /* + * We could detect which disks are configured in openfirmware + * but I think this unnecessarily limits us from being able to + * hot-plug bogus disks durning run-time. + * + */ + + for (i = 0; i < MAX_SYSTEMSIM_BD; i++) { struct gendisk *disk = alloc_disk(1); if (!disk) goto out; - mbd_dev[i].disk = disk; + systemsim_bd_dev[i].disk = disk; /* * The new linux 2.5 block layer implementation requires * every gendisk to have its very own request_queue struct. * These structs are big so we dynamically allocate them. */ - disk->queue = blk_init_queue(do_mbd_request, &mbd_lock); + disk->queue = + blk_init_queue(do_systemsim_bd_request, &systemsim_bd_lock); if (!disk->queue) { put_disk(disk); goto out; } } - if (register_blkdev(MAJOR_NR, "mbd")) { + if (register_blkdev(MAJOR_NR, "systemsim_bd")) { err = -EIO; goto out; } #ifdef MODULE - printk("mambo bogus disk: registered device at major %d\n", MAJOR_NR); + printk("systemsim bogus disk: registered device at major %d\n", + MAJOR_NR); #else - printk("mambo bogus disk: compiled in with kernel\n"); + printk("systemsim bogus disk: compiled in with kernel\n"); #endif + /* + * left device name alone for now as too much depends on it + * external to the kernel + * + */ + devfs_mk_dir("mambobd"); - for (i = 0; i < MAX_MBD; i++) { /* load defaults */ - struct gendisk *disk = mbd_dev[i].disk; - mbd_dev[i].initialized = 0; - mbd_dev[i].refcnt = 0; - mbd_dev[i].flags = 0; + for (i = 0; i < MAX_SYSTEMSIM_BD; i++) { /* load defaults */ + struct gendisk *disk = systemsim_bd_dev[i].disk; + systemsim_bd_dev[i].initialized = 0; + systemsim_bd_dev[i].refcnt = 0; + systemsim_bd_dev[i].flags = 0; disk->major = MAJOR_NR; disk->first_minor = i; - disk->fops = &mbd_fops; - disk->private_data = &mbd_dev[i]; + disk->fops = &systemsim_bd_fops; + disk->private_data = &systemsim_bd_dev[i]; sprintf(disk->disk_name, "mambobd%d", i); sprintf(disk->devfs_name, "mambobd%d", i); set_capacity(disk, 0x7ffffc00ULL << 1); /* 2 TB */ @@ -261,25 +284,25 @@ static int __init mbd_init(void) return 0; out: while (i--) { - if (mbd_dev[i].disk->queue) - blk_cleanup_queue(mbd_dev[i].disk->queue); - put_disk(mbd_dev[i].disk); + if (systemsim_bd_dev[i].disk->queue) + blk_cleanup_queue(systemsim_bd_dev[i].disk->queue); + put_disk(systemsim_bd_dev[i].disk); } return -EIO; } -static void __exit mbd_cleanup(void) +static void __exit systemsim_bd_cleanup(void) { devfs_remove("mambobd"); - if (unregister_blkdev(MAJOR_NR, "mbd") != 0) - printk("mbd: cleanup_module failed\n"); + if (unregister_blkdev(MAJOR_NR, "systemsim_bd") != 0) + printk("systemsim_bd: cleanup_module failed\n"); else - printk("mbd: module cleaned up.\n"); + printk("systemsim_bd: module cleaned up.\n"); } -module_init(mbd_init); -module_exit(mbd_cleanup); +module_init(systemsim_bd_init); +module_exit(systemsim_bd_cleanup); -MODULE_DESCRIPTION("Mambo Block Device"); +MODULE_DESCRIPTION("Systemsim Block Device"); MODULE_LICENSE("GPL"); -- 1.0.GIT From jfaslist at yahoo.fr Sat Feb 4 02:58:36 2006 From: jfaslist at yahoo.fr (jfaslist) Date: Fri, 03 Feb 2006 16:58:36 +0100 Subject: Maple freezing on PCI Target-Abort In-Reply-To: <1138930958.4934.102.camel@localhost.localdomain> References: <43E23B4A.4020402@yahoo.fr> <1138930958.4934.102.camel@localhost.localdomain> Message-ID: <43E37DAC.4030606@yahoo.fr> Hi, Yes, we are going to dig into all this CPC925 and Processor Interface initialization. Note that I checked that both MSR_ME and MSR_RI were set prior to triggering the PCI Target-Abort. -MSR_ME: If not set the CPU will "checkstop" on a machine chaeck. -MSR_RI: So that the exception is recoverable. Regarding MSR_RI, this should always be set, I think? Thanks -jfs Benjamin Herrenschmidt wrote: >>-What exception vector is taking care of a DERR excp? From what I can >>see it seems to be the "machine check" vector. But that seems a bit >>drastic to me. After all this is just a PCI target abort. >> >> > >I would expect a machine check yes. > > > >>-I expect that the normal behavior would be for the kernel to send a >>signal termination to the user process which caused the PIO READ PCI >>cycle (from a previously mmap()'ed VMA address). Is it doable on this >>platform? Since a READ operation is coupled by nature, I think this is >>the only acceptable way. >> >> > >It should SIGBUS except if the problem occurred in the kernel. I don't >know why it's not doing so, maybe you are hitting an issue/errata or >misconfiguration of the 925 ? > > > >>I have tried to set the MSR[RI] bit before doing the PCI cycle, but it >>didn't change change anything. Also on our design we disconnect the >>CPC925 checkstop pin from the 970 machine check pin.(see page 39 of >>cpc925 user's manual). So a DERR shouldn't cause a machine check I would >>think. >> >>I realize that these questions are very H/W related but couldn't find >>the answer in IBM doc. >> >> > > > > > > ___________________________________________________________________________ Nouveau : t?l?phonez moins cher avec Yahoo! Messenger ! D?couvez les tarifs exceptionnels pour appeler la France et l'international. T?l?chargez sur http://fr.messenger.yahoo.com From ahuja at austin.ibm.com Wed Feb 1 06:11:54 2006 From: ahuja at austin.ibm.com (Manish Ahuja) Date: Tue, 31 Jan 2006 13:11:54 -0600 Subject: [PATCH] PPC64 collect and export low-level cpu usage statistics In-Reply-To: <20060126204432.GG19465@austin.ibm.com> References: <43CFC094.8000709@austin.ibm.com> <20060126204432.GG19465@austin.ibm.com> Message-ID: <43DFB67A.5080508@austin.ibm.com> Yes, It probably is a good idea to have #define for it, but since purr is only available on power5 architecture, none of the other architecture's really need this code and maybe I should enclose this for power5 setup only. >>+static ssize_t show_dispatchedcycles(struct sys_device *, char *); >>+static ssize_t show_offline_cpu_cycles(struct sys_device *, char *); >>+ >>+static SYSDEV_ATTR(offline_cpu_cycles, 0444, show_offline_cpu_cycles, NULL); >>+static SYSDEV_ATTR(cpu_dispatched_cycles, 0444, show_dispatchedcycles, NULL); >> >> > >I think you need a #ifdef CONFIG_PPC64 around the above. > > >>- if (cpu_has_feature(CPU_FTR_SMT)) >>+ if (cpu_has_feature(CPU_FTR_SMT)) { >> sysdev_create_file(s, &attr_purr); >>+ sysdev_create_file(s, &attr_offline_cpu_cycles); >>+ sysdev_create_file(s, &attr_cpu_dispatched_cycles); >>+ } >> >> > >Shouldn't this be CPU_FTR_PURR not FTR_SMT ? (and also in the next >section too). > > > Yes, the original was FTR_SMT. I overlooked it. Thanks for pointing it out. +/* Defined in setup.c */ >>+extern u64 offline_cpu_total_tb; >>+extern u64 offline_cpu_total_cpu_util; >>+extern u64 offline_cpu_total_krncycles; >>+extern u64 offline_cpu_total_idle; >> >> > >These should be in a header file, probably arch/powerpc/kernel/setup.h > > > >>+static ssize_t show_offline_cpu_cycles(struct sys_device *dev, char *buf) >> >> > >#ifdef CONFIG_PPC64 surrounding the above .... > >--linas > > Okay, I can move it around, if its okay with everyone else. Thanks for the comments. From linas at austin.ibm.com Sat Feb 4 03:58:30 2006 From: linas at austin.ibm.com (Linas Vepstas) Date: Fri, 3 Feb 2006 10:58:30 -0600 Subject: creating PCI-related sysfs entries In-Reply-To: <1138957657.4934.124.camel@localhost.localdomain> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> <1138931103.4934.105.camel@localhost.localdomain> <20060203020341.GR24916@austin.ibm.com> <1138957657.4934.124.camel@localhost.localdomain> Message-ID: <20060203165829.GS24916@austin.ibm.com> On Fri, Feb 03, 2006 at 08:07:37PM +1100, Benjamin Herrenschmidt was heard to remark: > On Thu, 2006-02-02 at 20:03 -0600, Linas Vepstas wrote: > > > Yes, and EEH does do that (in mainline, 10K times in a row, > > last I tried). This email was in reference to the > > layout of /sys/bus/pci/slots which seems to have only hotplug > > slots in there; I am not yet sure why. Its possible John Rose > > can shed some rapid insight? > > Ok... also, about this "max number of resets" thing, it would be useful > in fact to have a rate limit rather ... a network card that for some > reason need to be reset about once a day is still fairly useable and it > would be nice if the system didn't consider it dead after 10 days ... Yes, I've often thought about this. Only two designs come to mind: 1) a timer pops ever 8 hours, and decrements the failure count by 1. Thus, anything less than 3 resets a day would be acceptable. 2) Store the jiffies of the last reset. Increment the fail count only if previous jiffies is less than 8 hours ago. Set fail count to 1 if previous jiffies is more then 48 hours ago. Advantage over 1: no timers. Any preferences? > Also, it might be useful to have an entry to force a retry on a card > that has been considered dead... Actually, hotplug remove/add or dlpar remove/add can be used to clear the count. (and that's how I do my test cases) The problem is that the documentation for this is buried somwhere where it cannot be found. Actually, this is one of my bigger/biggest concerns: the info about any of this is unfindable. I'd like to hype it up a bit, but am not sure how. --linas From linas at austin.ibm.com Sat Feb 4 04:08:34 2006 From: linas at austin.ibm.com (Linas Vepstas) Date: Fri, 3 Feb 2006 11:08:34 -0600 Subject: creating PCI-related sysfs entries In-Reply-To: <1138931602.4934.110.camel@localhost.localdomain> References: <20060131202214.GZ19465@austin.ibm.com> <20060131203456.GA23819@kroah.com> <20060131210805.GA19465@austin.ibm.com> <20060131212624.GA10513@kroah.com> <1138931602.4934.110.camel@localhost.localdomain> Message-ID: <20060203170834.GT24916@austin.ibm.com> On Fri, Feb 03, 2006 at 12:53:21PM +1100, Benjamin Herrenschmidt was heard to remark: > > > People have suggested that they create such a driver for a long time. > > Why not just do that? > > if he also wants consolidated "global" stats, > then yes, a host controller driver might be the way to go). I've had trouble parsing these suggestions. I can certainly hack up some pci-host structure so I can publish a few stats (the goal is to eliminate /proc/ppc64/eeh). By "hack" I mean something that would live with either the rpaphp code or the powerpc code. However, this is a different type of activity than the idea "define a generic architecture-neutral pci-host bridge structure". Maybe I should just do the first, and if it ignites anyone's imagination, we can talk about the second. --linas From info at schihei.de Sat Feb 4 05:04:42 2006 From: info at schihei.de (Heiko J Schick) Date: Fri, 3 Feb 2006 19:04:42 +0100 Subject: kernel debugging tool In-Reply-To: <0ITB0074BC0LA6@mmp2.samsung.com> References: <0ITB0074BC0LA6@mmp2.samsung.com> Message-ID: <07030771-257D-4204-A0C4-1833B9F9FBD3@schihei.de> Hello, you can also use XMON or KDB, which are kernel debuggers. XMON is normally included in PowerPC kernels. I think for KDB you have to patch your kernel, but that could be wrong. If you dump out the crash instruction and compare it with the assembler output of your GCC, you can find fast the source code line which caused the kernel panic. Perhaps the following links helps, too: http://urbanmyth.org/linux/oops/ http://www-128.ibm.com/developerworks/library/l-kprobes.html?ca=dgr- lnxw42Kprobe http://www-128.ibm.com/developerworks/linux/library/l-kdbug/ Sometimes also very useful, too. :) On Jan 19, 2006, at 1:00 AM, Hyo Jung Song wrote: > WE are interested in Cell BE (broadband engine) Linux patch. > (found in > http://www.bsc.es/projects/deepcomputing/linuxoncell/cbexdev.html) > We want to debug kernel sources sometimes. How can we do it? > I believe you guys debugged kernel source codes for CBE and you > used > some tools. > Could you please some tips for this? Thank you. > > > > Hyo Jung Song > Senior Engineer > Samsung Electronics > tel. 82-2-3416-0355 > > -----Original Message----- > From: Cell Support [mailto:cell_support at bsc.es] > Sent: Wednesday, January 18, 2006 11:27 PM > To: hjsong at samsung.com > Cc: cell_support at bsc.es > Subject: Re: Fwd: kernel debugging tool > > Dear Hyo, > > we don't develop linux patches for Cell BE. We got them from public > kernel mailing lists and post them to help > people to built a kernel that works with Cell BE. This avoids > having to > go through kernel mailing lists to > find the correct patch files that fit a specific kernel release. Hope > this helps people. > > We think you should post your question to a linux kernel mailing list. > Regarding the ppc64 kernel development, > the linuxppc64-dev at ozlabs.org is the right place > (https://ozlabs.org/mailman/listinfo/linuxpp