From greg at kroah.com Thu Sep 1 07:32:56 2005 From: greg at kroah.com (Greg KH) Date: Wed, 31 Aug 2005 14:32:56 -0700 Subject: [RFC/PATCH] Set up PCI tree from OF on ppc64 In-Reply-To: <17165.800.617053.891578@cargo.ozlabs.ibm.com> References: <17165.800.617053.891578@cargo.ozlabs.ibm.com> Message-ID: <20050831213256.GA20443@kroah.com> On Thu, Aug 25, 2005 at 09:30:40AM +1000, Paul Mackerras wrote: > The patch does some minor rearrangement of the generic PCI probing > code so that we can reuse as much code from drivers/pci as makes > sense. > > This has been tested on a dual G5 powermac and a partition on a POWER5 > machine (running under the hypervisor). > > Any comments? The pci rearrangement is fine with me. thanks, greg k-h From arnd at arndb.de Thu Sep 1 10:47:06 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Thu, 1 Sep 2005 02:47:06 +0200 Subject: [PATCH,RFC] Move Cell platform code to arch/powerpc Message-ID: <200509010247.07399.arnd@arndb.de> Move all files from arch/ppc64/kernel/bpa_* to arch/powerpc/platforms/cell, I would like to see a patch like this go into 2.6.14, for multiple reasons: - The marketing folks have changed the names and we are no longer supposed to refer to Cell as 'BPA' or 'Broadband Processor Architecture'. The platform is officially known as 'Cell Broadband Engine Architecture', while the CPU is the 'Cell Broadband Engine'. - We are now moving all platforms into arch/powerpc/platforms and someone has to start so we get a template for the other architectures to follow. - It would be a big mess for me to maintain my own patches on top of file names that are different from mainline during the 2.6.14 freeze. My impression is that Cell is a good target for moving first, because I have to move it anyway and the number of users is extremely low, so it doesn't cause too much harm if we screw up. What thing that makes moving Cell relatively easy is that it only supports 64 bit and only a single hardware configuration so far. I have tested this a bit on Cell and also done compile-only test for the other platforms, but it doesn't really make any changes to the code itself. Please comment on wether this is what everybody like the merge process be like. Signed-off-by: Arnd Bergmann -- arch/powerpc/platforms/cell/Makefile | 1 arch/ppc64/kernel/Makefile | 5 arch/powerpc/platforms/cell/pic.c | 269 ++++++++++++++++++++++ arch/ppc64/kernel/bpa_iic.c | 270 ---------------------- include/asm-powerpc/cell-pic.h | 62 +++++ arch/ppc64/kernel/bpa_iic.h | 62 ----- arch/powerpc/platforms/cell/iommu.c | 377 +++++++++++++++++++++++++++++++ arch/ppc64/kernel/bpa_iommu.c | 377 ------------------------------- arch/powerpc/platforms/cell/iommu.h | 65 +++++ arch/ppc64/kernel/bpa_iommu.h | 65 ----- arch/powerpc/platforms/cell/nvram.c | 118 +++++++++ arch/ppc64/kernel/bpa_nvram.c | 118 --------- arch/powerpc/platforms/cell/setup.c | 138 +++++++++++ arch/ppc64/kernel/bpa_setup.c | 140 ----------- arch/powerpc/platforms/cell/spider-pic.c | 190 +++++++++++++++ arch/ppc64/kernel/spider-pic.c | 191 --------------- arch/ppc64/Kconfig | 10 arch/ppc64/kernel/cpu_setup_power4.S | 2 arch/ppc64/kernel/cputable.c | 6 arch/ppc64/kernel/irq.c | 2 arch/ppc64/kernel/pSeries_smp.c | 4 arch/ppc64/kernel/setup.c | 8 arch/ppc64/kernel/traps.c | 4 include/asm-ppc64/nvram.h | 2 include/asm-ppc64/processor.h | 7 25 files changed, 1248 insertions(+), 1245 deletions(-) --- linux-cg.orig/arch/powerpc/platforms/cell/Makefile 1969-12-31 19:00:00.000000000 -0500 +++ linux-cg/arch/powerpc/platforms/cell/Makefile 2005-09-01 02:37:46.074992344 -0400 @@ -0,0 +1 @@ +obj-y += iommu.o nvram.o setup.o pic.o spider-pic.o --- linux-cg.orig/arch/powerpc/platforms/cell/iommu.c 1969-12-31 19:00:00.000000000 -0500 +++ linux-cg/arch/powerpc/platforms/cell/iommu.c 2005-09-01 02:37:46.076992040 -0400 @@ -0,0 +1,377 @@ +/* + * IOMMU implementation for Cell Broadband Engine + * + * We just establish a linear mapping at boot by setting all the + * IOPT cache entries in the CPU. + * The mapping functions should be identical to pci_direct_iommu, + * except for the handling of the high order bit that is required + * by the Spider bridge. These should be split into a separate + * file at the point where we get a different bridge chip. + * + * Copyright (C) 2005 IBM Deutschland Entwicklung GmbH, + * Arnd Bergmann + * + * Based on linear mapping + * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#undef DEBUG + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iommu.h" + +static inline unsigned long +get_iopt_entry(unsigned long real_address, unsigned long ioid, + unsigned long prot) +{ + return (prot & IOPT_PROT_MASK) + | (IOPT_COHERENT) + | (IOPT_ORDER_VC) + | (real_address & IOPT_RPN_MASK) + | (ioid & IOPT_IOID_MASK); +} + +typedef struct { + unsigned long val; +} ioste; + +static inline ioste +mk_ioste(unsigned long val) +{ + ioste ioste = { .val = val, }; + return ioste; +} + +static inline ioste +get_iost_entry(unsigned long iopt_base, unsigned long io_address, unsigned page_size) +{ + unsigned long ps; + unsigned long iostep; + unsigned long nnpt; + unsigned long shift; + + switch (page_size) { + case 0x1000000: + ps = IOST_PS_16M; + nnpt = 0; /* one page per segment */ + shift = 5; /* segment has 16 iopt entries */ + break; + + case 0x100000: + ps = IOST_PS_1M; + nnpt = 0; /* one page per segment */ + shift = 1; /* segment has 256 iopt entries */ + break; + + case 0x10000: + ps = IOST_PS_64K; + nnpt = 0x07; /* 8 pages per io page table */ + shift = 0; /* all entries are used */ + break; + + case 0x1000: + ps = IOST_PS_4K; + nnpt = 0x7f; /* 128 pages per io page table */ + shift = 0; /* all entries are used */ + break; + + default: /* not a known compile time constant */ + BUILD_BUG_ON(1); + break; + } + + iostep = iopt_base + + /* need 8 bytes per iopte */ + (((io_address / page_size * 8) + /* align io page tables on 4k page boundaries */ + << shift) + /* nnpt+1 pages go into each iopt */ + & ~(nnpt << 12)); + + nnpt++; /* this seems to work, but the documentation is not clear + about wether we put nnpt or nnpt-1 into the ioste bits. + In theory, this can't work for 4k pages. */ + return mk_ioste(IOST_VALID_MASK + | (iostep & IOST_PT_BASE_MASK) + | ((nnpt << 5) & IOST_NNPT_MASK) + | (ps & IOST_PS_MASK)); +} + +/* compute the address of an io pte */ +static inline unsigned long +get_ioptep(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopt_base; + unsigned long page_size; + unsigned long page_number; + unsigned long iopt_offset; + + iopt_base = iost_entry.val & IOST_PT_BASE_MASK; + page_size = iost_entry.val & IOST_PS_MASK; + + /* decode page size to compute page number */ + page_number = (io_address & 0x0fffffff) >> (10 + 2 * page_size); + /* page number is an offset into the io page table */ + iopt_offset = (page_number << 3) & 0x7fff8ul; + return iopt_base + iopt_offset; +} + +/* compute the tag field of the iopt cache entry */ +static inline unsigned long +get_ioc_tag(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopte = get_ioptep(iost_entry, io_address); + + return IOPT_VALID_MASK + | ((iopte & 0x00000000000000ff8ul) >> 3) + | ((iopte & 0x0000003fffffc0000ul) >> 9); +} + +/* compute the hashed 6 bit index for the 4-way associative pte cache */ +static inline unsigned long +get_ioc_hash(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopte = get_ioptep(iost_entry, io_address); + + return ((iopte & 0x000000000000001f8ul) >> 3) + ^ ((iopte & 0x00000000000020000ul) >> 17) + ^ ((iopte & 0x00000000000010000ul) >> 15) + ^ ((iopte & 0x00000000000008000ul) >> 13) + ^ ((iopte & 0x00000000000004000ul) >> 11) + ^ ((iopte & 0x00000000000002000ul) >> 9) + ^ ((iopte & 0x00000000000001000ul) >> 7); +} + +/* same as above, but pretend that we have a simpler 1-way associative + pte cache with an 8 bit index */ +static inline unsigned long +get_ioc_hash_1way(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopte = get_ioptep(iost_entry, io_address); + + return ((iopte & 0x000000000000001f8ul) >> 3) + ^ ((iopte & 0x00000000000020000ul) >> 17) + ^ ((iopte & 0x00000000000010000ul) >> 15) + ^ ((iopte & 0x00000000000008000ul) >> 13) + ^ ((iopte & 0x00000000000004000ul) >> 11) + ^ ((iopte & 0x00000000000002000ul) >> 9) + ^ ((iopte & 0x00000000000001000ul) >> 7) + ^ ((iopte & 0x0000000000000c000ul) >> 8); +} + +static inline ioste +get_iost_cache(void __iomem *base, unsigned long index) +{ + unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR); + return mk_ioste(in_be64(&p[index])); +} + +static inline void +set_iost_cache(void __iomem *base, unsigned long index, ioste ste) +{ + unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR); + pr_debug("ioste %02lx was %016lx, store %016lx", index, + get_iost_cache(base, index).val, ste.val); + out_be64(&p[index], ste.val); + pr_debug(" now %016lx\n", get_iost_cache(base, index).val); +} + +static inline unsigned long +get_iopt_cache(void __iomem *base, unsigned long index, unsigned long *tag) +{ + unsigned long __iomem *tags = (void *)(base + IOC_PT_CACHE_DIR); + unsigned long __iomem *p = (void *)(base + IOC_PT_CACHE_REG); + + *tag = tags[index]; + rmb(); + return *p; +} + +static inline void +set_iopt_cache(void __iomem *base, unsigned long index, + unsigned long tag, unsigned long val) +{ + unsigned long __iomem *tags = base + IOC_PT_CACHE_DIR; + unsigned long __iomem *p = base + IOC_PT_CACHE_REG; + pr_debug("iopt %02lx was v%016lx/t%016lx, store v%016lx/t%016lx\n", + index, get_iopt_cache(base, index, &oldtag), oldtag, val, tag); + + out_be64(p, val); + out_be64(&tags[index], tag); +} + +static inline void +set_iost_origin(void __iomem *base) +{ + unsigned long __iomem *p = base + IOC_ST_ORIGIN; + unsigned long origin = IOSTO_ENABLE | IOSTO_SW; + + pr_debug("iost_origin %016lx, now %016lx\n", in_be64(p), origin); + out_be64(p, origin); +} + +static inline void +set_iocmd_config(void __iomem *base) +{ + unsigned long __iomem *p = base + 0xc00; + unsigned long conf; + + conf = in_be64(p); + pr_debug("iost_conf %016lx, now %016lx\n", conf, conf | IOCMD_CONF_TE); + out_be64(p, conf | IOCMD_CONF_TE); +} + +/* FIXME: get these from the device tree */ +#define ioc_base 0x20000511000ull +#define ioc_mmio_base 0x20000510000ull +#define ioid 0x48a +#define iopt_phys_offset (- 0x20000000) /* We have a 512MB offset from the SB */ +#define io_page_size 0x1000000 + +static unsigned long map_iopt_entry(unsigned long address) +{ + switch (address >> 20) { + case 0x600: + address = 0x24020000000ull; /* spider i/o */ + break; + default: + address += iopt_phys_offset; + break; + } + + return get_iopt_entry(address, ioid, IOPT_PROT_RW); +} + +static void iommu_bus_setup_null(struct pci_bus *b) { } +static void iommu_dev_setup_null(struct pci_dev *d) { } + +/* initialize the iommu to support a simple linear mapping + * for each DMA window used by any device. For now, we + * happen to know that there is only one DMA window in use, + * starting at iopt_phys_offset. */ +static void cell_map_iommu(void) +{ + unsigned long address; + void __iomem *base; + ioste ioste; + unsigned long index; + + base = __ioremap(ioc_base, 0x1000, _PAGE_NO_CACHE); + pr_debug("%lx mapped to %p\n", ioc_base, base); + set_iocmd_config(base); + iounmap(base); + + base = __ioremap(ioc_mmio_base, 0x1000, _PAGE_NO_CACHE); + pr_debug("%lx mapped to %p\n", ioc_mmio_base, base); + + set_iost_origin(base); + + for (address = 0; address < 0x100000000ul; address += io_page_size) { + ioste = get_iost_entry(0x10000000000ul, address, io_page_size); + if ((address & 0xfffffff) == 0) /* segment start */ + set_iost_cache(base, address >> 28, ioste); + index = get_ioc_hash_1way(ioste, address); + pr_debug("addr %08lx, index %02lx, ioste %016lx\n", + address, index, ioste.val); + set_iopt_cache(base, + get_ioc_hash_1way(ioste, address), + get_ioc_tag(ioste, address), + map_iopt_entry(address)); + } + iounmap(base); +} + + +static void *cell_alloc_coherent(struct device *hwdev, size_t size, + dma_addr_t *dma_handle, unsigned int __nocast flag) +{ + void *ret; + + ret = (void *)__get_free_pages(flag, get_order(size)); + if (ret != NULL) { + memset(ret, 0, size); + *dma_handle = virt_to_abs(ret) | CELL_DMA_VALID; + } + return ret; +} + +static void cell_free_coherent(struct device *hwdev, size_t size, + void *vaddr, dma_addr_t dma_handle) +{ + free_pages((unsigned long)vaddr, get_order(size)); +} + +static dma_addr_t cell_map_single(struct device *hwdev, void *ptr, + size_t size, enum dma_data_direction direction) +{ + return virt_to_abs(ptr) | CELL_DMA_VALID; +} + +static void cell_unmap_single(struct device *hwdev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction direction) +{ +} + +static int cell_map_sg(struct device *hwdev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ + int i; + + for (i = 0; i < nents; i++, sg++) { + sg->dma_address = (page_to_phys(sg->page) + sg->offset) + | CELL_DMA_VALID; + sg->dma_length = sg->length; + } + + return nents; +} + +static void cell_unmap_sg(struct device *hwdev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ +} + +static int cell_dma_supported(struct device *dev, u64 mask) +{ + return mask < 0x100000000ull; +} + +void cell_init_iommu(void) +{ + cell_map_iommu(); + + /* Direct I/O, IOMMU off */ + ppc_md.iommu_dev_setup = iommu_dev_setup_null; + ppc_md.iommu_bus_setup = iommu_bus_setup_null; + + pci_dma_ops.alloc_coherent = cell_alloc_coherent; + pci_dma_ops.free_coherent = cell_free_coherent; + pci_dma_ops.map_single = cell_map_single; + pci_dma_ops.unmap_single = cell_unmap_single; + pci_dma_ops.map_sg = cell_map_sg; + pci_dma_ops.unmap_sg = cell_unmap_sg; + pci_dma_ops.dma_supported = cell_dma_supported; +} --- linux-cg.orig/arch/powerpc/platforms/cell/iommu.h 1969-12-31 19:00:00.000000000 -0500 +++ linux-cg/arch/powerpc/platforms/cell/iommu.h 2005-09-01 02:37:46.077991888 -0400 @@ -0,0 +1,65 @@ +#ifndef CELL_IOMMU_H +#define CELL_IOMMU_H + +/* some constants */ +enum { + /* segment table entries */ + IOST_VALID_MASK = 0x8000000000000000ul, + IOST_TAG_MASK = 0x3000000000000000ul, + IOST_PT_BASE_MASK = 0x000003fffffff000ul, + IOST_NNPT_MASK = 0x0000000000000fe0ul, + IOST_PS_MASK = 0x000000000000000ful, + + IOST_PS_4K = 0x1, + IOST_PS_64K = 0x3, + IOST_PS_1M = 0x5, + IOST_PS_16M = 0x7, + + /* iopt tag register */ + IOPT_VALID_MASK = 0x0000000200000000ul, + IOPT_TAG_MASK = 0x00000001fffffffful, + + /* iopt cache register */ + IOPT_PROT_MASK = 0xc000000000000000ul, + IOPT_PROT_NONE = 0x0000000000000000ul, + IOPT_PROT_READ = 0x4000000000000000ul, + IOPT_PROT_WRITE = 0x8000000000000000ul, + IOPT_PROT_RW = 0xc000000000000000ul, + IOPT_COHERENT = 0x2000000000000000ul, + + IOPT_ORDER_MASK = 0x1800000000000000ul, + /* order access to same IOID/VC on same address */ + IOPT_ORDER_ADDR = 0x0800000000000000ul, + /* similar, but only after a write access */ + IOPT_ORDER_WRITES = 0x1000000000000000ul, + /* Order all accesses to same IOID/VC */ + IOPT_ORDER_VC = 0x1800000000000000ul, + + IOPT_RPN_MASK = 0x000003fffffff000ul, + IOPT_HINT_MASK = 0x0000000000000800ul, + IOPT_IOID_MASK = 0x00000000000007fful, + + IOSTO_ENABLE = 0x8000000000000000ul, + IOSTO_ORIGIN = 0x000003fffffff000ul, + IOSTO_HW = 0x0000000000000800ul, + IOSTO_SW = 0x0000000000000400ul, + + IOCMD_CONF_TE = 0x0000800000000000ul, + + /* memory mapped registers */ + IOC_PT_CACHE_DIR = 0x000, + IOC_ST_CACHE_DIR = 0x800, + IOC_PT_CACHE_REG = 0x910, + IOC_ST_ORIGIN = 0x918, + IOC_CONF = 0x930, + + /* The high bit needs to be set on every DMA address, + only 2GB are addressable */ + CELL_DMA_VALID = 0x80000000, + CELL_DMA_MASK = 0x7fffffff, +}; + + +void cell_init_iommu(void); + +#endif --- linux-cg.orig/arch/powerpc/platforms/cell/nvram.c 1969-12-31 19:00:00.000000000 -0500 +++ linux-cg/arch/powerpc/platforms/cell/nvram.c 2005-09-01 02:37:46.077991888 -0400 @@ -0,0 +1,118 @@ +/* + * NVRAM for Cell Blade + * + * (C) Copyright IBM Corp. 2005 + * + * Authors : Utz Bacher + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include +#include +#include +#include + +#include +#include +#include + +static void __iomem *cell_nvram_start; +static long cell_nvram_len; +static spinlock_t cell_nvram_lock = SPIN_LOCK_UNLOCKED; + +static ssize_t cell_nvram_read(char *buf, size_t count, loff_t *index) +{ + unsigned long flags; + + if (*index >= cell_nvram_len) + return 0; + if (*index + count > cell_nvram_len) + count = cell_nvram_len - *index; + + spin_lock_irqsave(&cell_nvram_lock, flags); + + memcpy_fromio(buf, cell_nvram_start + *index, count); + + spin_unlock_irqrestore(&cell_nvram_lock, flags); + + *index += count; + return count; +} + +static ssize_t cell_nvram_write(char *buf, size_t count, loff_t *index) +{ + unsigned long flags; + + if (*index >= cell_nvram_len) + return 0; + if (*index + count > cell_nvram_len) + count = cell_nvram_len - *index; + + spin_lock_irqsave(&cell_nvram_lock, flags); + + memcpy_toio(cell_nvram_start + *index, buf, count); + + spin_unlock_irqrestore(&cell_nvram_lock, flags); + + *index += count; + return count; +} + +static ssize_t cell_nvram_get_size(void) +{ + return cell_nvram_len; +} + +int __init cell_nvram_init(void) +{ + struct device_node *nvram_node; + unsigned long *buffer; + int proplen; + unsigned long nvram_addr; + int ret; + + ret = -ENODEV; + nvram_node = of_find_node_by_type(NULL, "nvram"); + if (!nvram_node) + goto out; + + ret = -EIO; + buffer = (unsigned long *)get_property(nvram_node, "reg", &proplen); + if (proplen != 2*sizeof(unsigned long)) + goto out; + + ret = -ENODEV; + nvram_addr = buffer[0]; + cell_nvram_len = buffer[1]; + if ( (!cell_nvram_len) || (!nvram_addr) ) + goto out; + + cell_nvram_start = ioremap(nvram_addr, cell_nvram_len); + if (!cell_nvram_start) + goto out; + + printk(KERN_INFO "CBEA NVRAM, %luk mapped to %p\n", + cell_nvram_len >> 10, cell_nvram_start); + + ppc_md.nvram_read = cell_nvram_read; + ppc_md.nvram_write = cell_nvram_write; + ppc_md.nvram_size = cell_nvram_get_size; + +out: + of_node_put(nvram_node); + return ret; +} --- linux-cg.orig/arch/powerpc/platforms/cell/pic.c 1969-12-31 19:00:00.000000000 -0500 +++ linux-cg/arch/powerpc/platforms/cell/pic.c 2005-09-01 02:37:46.079991584 -0400 @@ -0,0 +1,269 @@ +/* + * Cell Internal Interrupt Controller + * + * (C) Copyright IBM Deutschland Entwicklung GmbH 2005 + * + * Author: Arnd Bergmann + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +struct iic_pending_bits { + u32 data; + u8 flags; + u8 class; + u8 source; + u8 prio; +}; + +enum iic_pending_flags { + IIC_VALID = 0x80, + IIC_IPI = 0x40, +}; + +struct iic_regs { + struct iic_pending_bits pending; + struct iic_pending_bits pending_destr; + u64 generate; + u64 prio; +}; + +struct iic { + struct iic_regs __iomem *regs; +}; + +static DEFINE_PER_CPU(struct iic, iic); + +void iic_local_enable(void) +{ + out_be64(&__get_cpu_var(iic).regs->prio, 0xff); +} + +void iic_local_disable(void) +{ + out_be64(&__get_cpu_var(iic).regs->prio, 0x0); +} + +static unsigned int iic_startup(unsigned int irq) +{ + return 0; +} + +static void iic_enable(unsigned int irq) +{ + iic_local_enable(); +} + +static void iic_disable(unsigned int irq) +{ +} + +static void iic_end(unsigned int irq) +{ + iic_local_enable(); +} + +static struct hw_interrupt_type iic_pic = { + .typename = " CELLPIC ", + .startup = iic_startup, + .enable = iic_enable, + .disable = iic_disable, + .end = iic_end, +}; + +static int iic_external_get_irq(struct iic_pending_bits pending) +{ + int irq; + unsigned char node, unit; + + node = pending.source >> 4; + unit = pending.source & 0xf; + irq = -1; + + /* + * This mapping is specific to the Broadband + * Engine. We might need to get the numbers + * from the device tree to support future CPUs. + */ + switch (unit) { + case 0x00: + case 0x0b: + /* + * One of these units can be connected + * to an external interrupt controller. + */ + if (pending.prio > 0x3f || + pending.class != 2) + break; + irq = IIC_EXT_OFFSET + + spider_get_irq(pending.prio + node * IIC_NODE_STRIDE) + + node * IIC_NODE_STRIDE; + break; + case 0x01 ... 0x04: + case 0x07 ... 0x0a: + /* + * These units are connected to the SPEs + */ + if (pending.class > 2) + break; + irq = IIC_SPE_OFFSET + + pending.class * IIC_CLASS_STRIDE + + node * IIC_NODE_STRIDE + + unit; + break; + } + if (irq == -1) + printk(KERN_WARNING "Unexpected interrupt class %02x, " + "source %02x, prio %02x, cpu %02x\n", pending.class, + pending.source, pending.prio, smp_processor_id()); + return irq; +} + +/* Get an IRQ number from the pending state register of the IIC */ +int iic_get_irq(struct pt_regs *regs) +{ + struct iic *iic; + int irq; + struct iic_pending_bits pending; + + iic = &__get_cpu_var(iic); + *(unsigned long *) &pending = + in_be64((unsigned long __iomem *) &iic->regs->pending_destr); + + irq = -1; + if (pending.flags & IIC_VALID) { + if (pending.flags & IIC_IPI) { + irq = IIC_IPI_OFFSET + (pending.prio >> 4); +/* + if (irq > 0x80) + printk(KERN_WARNING "Unexpected IPI prio %02x" + "on CPU %02x\n", pending.prio, + smp_processor_id()); +*/ + } else { + irq = iic_external_get_irq(pending); + } + } + return irq; +} + +static struct iic_regs __iomem *find_iic(int cpu) +{ + struct device_node *np; + int nodeid = cpu / 2; + unsigned long regs; + struct iic_regs __iomem *iic_regs; + + for (np = of_find_node_by_type(NULL, "cpu"); + np; + np = of_find_node_by_type(np, "cpu")) { + if (nodeid == *(int *)get_property(np, "node-id", NULL)) + break; + } + + if (!np) { + printk(KERN_WARNING "IIC: CPU %d not found\n", cpu); + iic_regs = NULL; + } else { + regs = *(long *)get_property(np, "iic", NULL); + + /* hack until we have decided on the devtree info */ + regs += 0x400; + if (cpu & 1) + regs += 0x20; + + printk(KERN_DEBUG "IIC for CPU %d at %lx\n", cpu, regs); + iic_regs = __ioremap(regs, sizeof(struct iic_regs), + _PAGE_NO_CACHE); + } + return iic_regs; +} + +#ifdef CONFIG_SMP +void iic_setup_cpu(void) +{ + out_be64(&__get_cpu_var(iic).regs->prio, 0xff); +} + +void iic_cause_IPI(int cpu, int mesg) +{ + out_be64(&per_cpu(iic, cpu).regs->generate, mesg); +} + +static irqreturn_t iic_ipi_action(int irq, void *dev_id, struct pt_regs *regs) +{ + + smp_message_recv(irq - IIC_IPI_OFFSET, regs); + return IRQ_HANDLED; +} + +static void iic_request_ipi(int irq, const char *name) +{ + /* IPIs are marked SA_INTERRUPT as they must run with irqs + * disabled */ + get_irq_desc(irq)->handler = &iic_pic; + get_irq_desc(irq)->status |= IRQ_PER_CPU; + request_irq(irq, iic_ipi_action, SA_INTERRUPT, name, NULL); +} + +void iic_request_IPIs(void) +{ + iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_CALL_FUNCTION, "IPI-call"); + iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_RESCHEDULE, "IPI-resched"); +#ifdef CONFIG_DEBUGGER + iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_DEBUGGER_BREAK, "IPI-debug"); +#endif /* CONFIG_DEBUGGER */ +} +#endif /* CONFIG_SMP */ + +static void iic_setup_spe_handlers(void) +{ + int be, isrc; + + /* Assume two threads per BE are present */ + for (be=0; be < num_present_cpus() / 2; be++) { + for (isrc = 0; isrc < IIC_CLASS_STRIDE * 3; isrc++) { + int irq = IIC_NODE_STRIDE * be + IIC_SPE_OFFSET + isrc; + get_irq_desc(irq)->handler = &iic_pic; + } + } +} + +void iic_init_IRQ(void) +{ + int cpu, irq_offset; + struct iic *iic; + + irq_offset = 0; + for_each_cpu(cpu) { + iic = &per_cpu(iic, cpu); + iic->regs = find_iic(cpu); + if (iic->regs) + out_be64(&iic->regs->prio, 0xff); + } + iic_setup_spe_handlers(); +} --- linux-cg.orig/arch/powerpc/platforms/cell/setup.c 1969-12-31 19:00:00.000000000 -0500 +++ linux-cg/arch/powerpc/platforms/cell/setup.c 2005-09-01 02:37:46.079991584 -0400 @@ -0,0 +1,138 @@ +/* + * Copyright (C) 1995 Linus Torvalds + * Adapted from 'alpha' version by Gary Thomas + * Modified by Cort Dougan (cort at cs.nmt.edu) + * Modified by PPC64 Team, IBM Corp + * Modified by Cell Team, IBM Deutschland Entwicklung GmbH + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#undef DEBUG + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../../../ppc64/kernel/pci.h" +#include "iommu.h" + +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +void cell_get_cpuinfo(struct seq_file *m) +{ + struct device_node *root; + const char *model = ""; + + root = of_find_node_by_path("/"); + if (root) + model = get_property(root, "model", NULL); + seq_printf(m, "machine\t\t: Cell %s\n", model); + of_node_put(root); +} + +static void cell_progress(char *s, unsigned short hex) +{ + printk("*** %04x : %s\n", hex, s ? s : ""); +} + +static void __init cell_setup_arch(void) +{ + ppc_md.init_IRQ = iic_init_IRQ; + ppc_md.get_irq = iic_get_irq; + +#ifdef CONFIG_SMP + smp_init_pSeries(); +#endif + + /* init to some ~sane value until calibrate_delay() runs */ + loops_per_jiffy = 50000000; + + if (ROOT_DEV == 0) { + printk("No ramdisk, default root is /dev/hda2\n"); + ROOT_DEV = Root_HDA2; + } + + /* Find and initialize PCI host bridges */ + init_pci_config_tokens(); + find_and_init_phbs(); + spider_init_IRQ(); +#ifdef CONFIG_DUMMY_CONSOLE + conswitchp = &dummy_con; +#endif + + cell_nvram_init(); +} + +/* + * Early initialization. Relocation is on but do not reference unbolted pages + */ +static void __init cell_init_early(void) +{ + DBG(" -> cell_init_early()\n"); + + hpte_init_native(); + + cell_init_iommu(); + + ppc64_interrupt_controller = IC_CELL_PIC; + + DBG(" <- cell_init_early()\n"); +} + + +static int __init cell_probe(int platform) +{ + if (platform != PLATFORM_CELL) + return 0; + + return 1; +} + +struct machdep_calls __initdata cell_md = { + .probe = cell_probe, + .setup_arch = cell_setup_arch, + .init_early = cell_init_early, + .get_cpuinfo = cell_get_cpuinfo, + .restart = rtas_restart, + .power_off = rtas_power_off, + .halt = rtas_halt, + .get_boot_time = rtas_get_boot_time, + .get_rtc_time = rtas_get_rtc_time, + .set_rtc_time = rtas_set_rtc_time, + .calibrate_decr = generic_calibrate_decr, + .progress = cell_progress, +}; --- linux-cg.orig/arch/powerpc/platforms/cell/spider-pic.c 1969-12-31 19:00:00.000000000 -0500 +++ linux-cg/arch/powerpc/platforms/cell/spider-pic.c 2005-09-01 02:37:46.081991280 -0400 @@ -0,0 +1,190 @@ +/* + * External Interrupt Controller on Spider South Bridge + * + * (C) Copyright IBM Deutschland Entwicklung GmbH 2005 + * + * Author: Arnd Bergmann + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include + +#include +#include +#include +#include + +/* register layout taken from Spider spec, table 7.4-4 */ +enum { + TIR_DEN = 0x004, /* Detection Enable Register */ + TIR_MSK = 0x084, /* Mask Level Register */ + TIR_EDC = 0x0c0, /* Edge Detection Clear Register */ + TIR_PNDA = 0x100, /* Pending Register A */ + TIR_PNDB = 0x104, /* Pending Register B */ + TIR_CS = 0x144, /* Current Status Register */ + TIR_LCSA = 0x150, /* Level Current Status Register A */ + TIR_LCSB = 0x154, /* Level Current Status Register B */ + TIR_LCSC = 0x158, /* Level Current Status Register C */ + TIR_LCSD = 0x15c, /* Level Current Status Register D */ + TIR_CFGA = 0x200, /* Setting Register A0 */ + TIR_CFGB = 0x204, /* Setting Register B0 */ + /* 0x208 ... 0x3ff Setting Register An/Bn */ + TIR_PPNDA = 0x400, /* Packet Pending Register A */ + TIR_PPNDB = 0x404, /* Packet Pending Register B */ + TIR_PIERA = 0x408, /* Packet Output Error Register A */ + TIR_PIERB = 0x40c, /* Packet Output Error Register B */ + TIR_PIEN = 0x444, /* Packet Output Enable Register */ + TIR_PIPND = 0x454, /* Packet Output Pending Register */ + TIRDID = 0x484, /* Spider Device ID Register */ + REISTIM = 0x500, /* Reissue Command Timeout Time Setting */ + REISTIMEN = 0x504, /* Reissue Command Timeout Setting */ + REISWAITEN = 0x508, /* Reissue Wait Control*/ +}; + +static void __iomem *spider_pics[4]; + +static void __iomem *spider_get_pic(int irq) +{ + int node = irq / IIC_NODE_STRIDE; + irq %= IIC_NODE_STRIDE; + + if (irq >= IIC_EXT_OFFSET && + irq < IIC_EXT_OFFSET + IIC_NUM_EXT && + spider_pics) + return spider_pics[node]; + return NULL; +} + +static int spider_get_nr(unsigned int irq) +{ + return (irq % IIC_NODE_STRIDE) - IIC_EXT_OFFSET; +} + +static void __iomem *spider_get_irq_config(int irq) +{ + void __iomem *pic; + pic = spider_get_pic(irq); + return pic + TIR_CFGA + 8 * spider_get_nr(irq); +} + +static void spider_enable_irq(unsigned int irq) +{ + void __iomem *cfg = spider_get_irq_config(irq); + irq = spider_get_nr(irq); + + out_be32(cfg, in_be32(cfg) | 0x3107000eu); + out_be32(cfg + 4, in_be32(cfg + 4) | 0x00020000u | irq); +} + +static void spider_disable_irq(unsigned int irq) +{ + void __iomem *cfg = spider_get_irq_config(irq); + irq = spider_get_nr(irq); + + out_be32(cfg, in_be32(cfg) & ~0x30000000u); +} + +static unsigned int spider_startup_irq(unsigned int irq) +{ + spider_enable_irq(irq); + return 0; +} + +static void spider_shutdown_irq(unsigned int irq) +{ + spider_disable_irq(irq); +} + +static void spider_end_irq(unsigned int irq) +{ + spider_enable_irq(irq); +} + +static void spider_ack_irq(unsigned int irq) +{ + spider_disable_irq(irq); + iic_local_enable(); +} + +static struct hw_interrupt_type spider_pic = { + .typename = " SPIDER ", + .startup = spider_startup_irq, + .shutdown = spider_shutdown_irq, + .enable = spider_enable_irq, + .disable = spider_disable_irq, + .ack = spider_ack_irq, + .end = spider_end_irq, +}; + + +int spider_get_irq(unsigned long int_pending) +{ + void __iomem *regs = spider_get_pic(int_pending); + unsigned long cs; + int irq; + + cs = in_be32(regs + TIR_CS); + + irq = cs >> 24; + if (irq != 63) + return irq; + + return -1; +} + +void spider_init_IRQ(void) +{ + int node; + struct device_node *dn; + unsigned int *property; + long spiderpic; + int n; + +/* FIXME: detect multiple PICs as soon as the device tree has them */ + for (node = 0; node < 1; node++) { + dn = of_find_node_by_path("/"); + n = prom_n_addr_cells(dn); + property = (unsigned int *) get_property(dn, + "platform-spider-pic", NULL); + + if (!property) + continue; + for (spiderpic = 0; n > 0; --n) + spiderpic = (spiderpic << 32) + *property++; + printk(KERN_DEBUG "SPIDER addr: %lx\n", spiderpic); + spider_pics[node] = __ioremap(spiderpic, 0x800, _PAGE_NO_CACHE); + for (n = 0; n < IIC_NUM_EXT; n++) { + int irq = n + IIC_EXT_OFFSET + node * IIC_NODE_STRIDE; + get_irq_desc(irq)->handler = &spider_pic; + + /* do not mask any interrupts because of level */ + out_be32(spider_pics[node] + TIR_MSK, 0x0); + + /* disable edge detection clear */ + /* out_be32(spider_pics[node] + TIR_EDC, 0x0); */ + + /* enable interrupt packets to be output */ + out_be32(spider_pics[node] + TIR_PIEN, + in_be32(spider_pics[node] + TIR_PIEN) | 0x1); + + /* Enable the interrupt detection enable bit. Do this last! */ + out_be32(spider_pics[node] + TIR_DEN, + in_be32(spider_pics[node] +TIR_DEN) | 0x1); + + } + } +} --- linux-cg.orig/arch/ppc64/Kconfig 2005-09-01 02:37:40.900980184 -0400 +++ linux-cg/arch/ppc64/Kconfig 2005-09-01 02:37:46.081991280 -0400 @@ -77,10 +77,16 @@ config PPC_PSERIES bool " IBM pSeries & new iSeries" default y -config PPC_BPA - bool " Broadband Processor Architecture" +config PPC_CELL + bool " Cell Broadband Engine Architecture" depends on PPC_MULTIPLATFORM +# This is being phased out in the move to arch/powerpc +config PPC_BPA + bool + default y + depends on PPC_CELL + config PPC_PMAC depends on PPC_MULTIPLATFORM bool " Apple G5 based machines" --- linux-cg.orig/arch/ppc64/kernel/Makefile 2005-09-01 02:37:40.903979728 -0400 +++ linux-cg/arch/ppc64/kernel/Makefile 2005-09-01 02:37:46.082991128 -0400 @@ -33,8 +33,7 @@ obj-$(CONFIG_PPC_PSERIES) += pSeries_pci pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \ pSeries_setup.o pSeries_iommu.o -obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_iommu.o bpa_nvram.o \ - bpa_iic.o spider-pic.o +obj-$(CONFIG_PPC_CELL) += ../../powerpc/platforms/cell/ obj-$(CONFIG_KEXEC) += machine_kexec.o obj-$(CONFIG_EEH) += eeh.o @@ -68,7 +67,7 @@ ifdef CONFIG_SMP obj-$(CONFIG_PPC_PMAC) += pmac_smp.o smp-tbsync.o obj-$(CONFIG_PPC_ISERIES) += iSeries_smp.o obj-$(CONFIG_PPC_PSERIES) += pSeries_smp.o -obj-$(CONFIG_PPC_BPA) += pSeries_smp.o +obj-$(CONFIG_PPC_CELL) += pSeries_smp.o obj-$(CONFIG_PPC_MAPLE) += smp-tbsync.o endif --- linux-cg.orig/arch/ppc64/kernel/bpa_iic.c 2005-09-01 02:37:40.905979424 -0400 +++ linux-cg/arch/ppc64/kernel/bpa_iic.c 1969-12-31 19:00:00.000000000 -0500 @@ -1,270 +0,0 @@ -/* - * BPA Internal Interrupt Controller - * - * (C) Copyright IBM Deutschland Entwicklung GmbH 2005 - * - * Author: Arnd Bergmann - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2, or (at your option) - * any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include -#include -#include -#include -#include - -#include -#include -#include -#include - -#include "bpa_iic.h" - -struct iic_pending_bits { - u32 data; - u8 flags; - u8 class; - u8 source; - u8 prio; -}; - -enum iic_pending_flags { - IIC_VALID = 0x80, - IIC_IPI = 0x40, -}; - -struct iic_regs { - struct iic_pending_bits pending; - struct iic_pending_bits pending_destr; - u64 generate; - u64 prio; -}; - -struct iic { - struct iic_regs __iomem *regs; -}; - -static DEFINE_PER_CPU(struct iic, iic); - -void iic_local_enable(void) -{ - out_be64(&__get_cpu_var(iic).regs->prio, 0xff); -} - -void iic_local_disable(void) -{ - out_be64(&__get_cpu_var(iic).regs->prio, 0x0); -} - -static unsigned int iic_startup(unsigned int irq) -{ - return 0; -} - -static void iic_enable(unsigned int irq) -{ - iic_local_enable(); -} - -static void iic_disable(unsigned int irq) -{ -} - -static void iic_end(unsigned int irq) -{ - iic_local_enable(); -} - -static struct hw_interrupt_type iic_pic = { - .typename = " BPA-IIC ", - .startup = iic_startup, - .enable = iic_enable, - .disable = iic_disable, - .end = iic_end, -}; - -static int iic_external_get_irq(struct iic_pending_bits pending) -{ - int irq; - unsigned char node, unit; - - node = pending.source >> 4; - unit = pending.source & 0xf; - irq = -1; - - /* - * This mapping is specific to the Broadband - * Engine. We might need to get the numbers - * from the device tree to support future CPUs. - */ - switch (unit) { - case 0x00: - case 0x0b: - /* - * One of these units can be connected - * to an external interrupt controller. - */ - if (pending.prio > 0x3f || - pending.class != 2) - break; - irq = IIC_EXT_OFFSET - + spider_get_irq(pending.prio + node * IIC_NODE_STRIDE) - + node * IIC_NODE_STRIDE; - break; - case 0x01 ... 0x04: - case 0x07 ... 0x0a: - /* - * These units are connected to the SPEs - */ - if (pending.class > 2) - break; - irq = IIC_SPE_OFFSET - + pending.class * IIC_CLASS_STRIDE - + node * IIC_NODE_STRIDE - + unit; - break; - } - if (irq == -1) - printk(KERN_WARNING "Unexpected interrupt class %02x, " - "source %02x, prio %02x, cpu %02x\n", pending.class, - pending.source, pending.prio, smp_processor_id()); - return irq; -} - -/* Get an IRQ number from the pending state register of the IIC */ -int iic_get_irq(struct pt_regs *regs) -{ - struct iic *iic; - int irq; - struct iic_pending_bits pending; - - iic = &__get_cpu_var(iic); - *(unsigned long *) &pending = - in_be64((unsigned long __iomem *) &iic->regs->pending_destr); - - irq = -1; - if (pending.flags & IIC_VALID) { - if (pending.flags & IIC_IPI) { - irq = IIC_IPI_OFFSET + (pending.prio >> 4); -/* - if (irq > 0x80) - printk(KERN_WARNING "Unexpected IPI prio %02x" - "on CPU %02x\n", pending.prio, - smp_processor_id()); -*/ - } else { - irq = iic_external_get_irq(pending); - } - } - return irq; -} - -static struct iic_regs __iomem *find_iic(int cpu) -{ - struct device_node *np; - int nodeid = cpu / 2; - unsigned long regs; - struct iic_regs __iomem *iic_regs; - - for (np = of_find_node_by_type(NULL, "cpu"); - np; - np = of_find_node_by_type(np, "cpu")) { - if (nodeid == *(int *)get_property(np, "node-id", NULL)) - break; - } - - if (!np) { - printk(KERN_WARNING "IIC: CPU %d not found\n", cpu); - iic_regs = NULL; - } else { - regs = *(long *)get_property(np, "iic", NULL); - - /* hack until we have decided on the devtree info */ - regs += 0x400; - if (cpu & 1) - regs += 0x20; - - printk(KERN_DEBUG "IIC for CPU %d at %lx\n", cpu, regs); - iic_regs = __ioremap(regs, sizeof(struct iic_regs), - _PAGE_NO_CACHE); - } - return iic_regs; -} - -#ifdef CONFIG_SMP -void iic_setup_cpu(void) -{ - out_be64(&__get_cpu_var(iic).regs->prio, 0xff); -} - -void iic_cause_IPI(int cpu, int mesg) -{ - out_be64(&per_cpu(iic, cpu).regs->generate, mesg); -} - -static irqreturn_t iic_ipi_action(int irq, void *dev_id, struct pt_regs *regs) -{ - - smp_message_recv(irq - IIC_IPI_OFFSET, regs); - return IRQ_HANDLED; -} - -static void iic_request_ipi(int irq, const char *name) -{ - /* IPIs are marked SA_INTERRUPT as they must run with irqs - * disabled */ - get_irq_desc(irq)->handler = &iic_pic; - get_irq_desc(irq)->status |= IRQ_PER_CPU; - request_irq(irq, iic_ipi_action, SA_INTERRUPT, name, NULL); -} - -void iic_request_IPIs(void) -{ - iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_CALL_FUNCTION, "IPI-call"); - iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_RESCHEDULE, "IPI-resched"); -#ifdef CONFIG_DEBUGGER - iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_DEBUGGER_BREAK, "IPI-debug"); -#endif /* CONFIG_DEBUGGER */ -} -#endif /* CONFIG_SMP */ - -static void iic_setup_spe_handlers(void) -{ - int be, isrc; - - /* Assume two threads per BE are present */ - for (be=0; be < num_present_cpus() / 2; be++) { - for (isrc = 0; isrc < IIC_CLASS_STRIDE * 3; isrc++) { - int irq = IIC_NODE_STRIDE * be + IIC_SPE_OFFSET + isrc; - get_irq_desc(irq)->handler = &iic_pic; - } - } -} - -void iic_init_IRQ(void) -{ - int cpu, irq_offset; - struct iic *iic; - - irq_offset = 0; - for_each_cpu(cpu) { - iic = &per_cpu(iic, cpu); - iic->regs = find_iic(cpu); - if (iic->regs) - out_be64(&iic->regs->prio, 0xff); - } - iic_setup_spe_handlers(); -} --- linux-cg.orig/arch/ppc64/kernel/bpa_iic.h 2005-09-01 02:37:40.908978968 -0400 +++ linux-cg/arch/ppc64/kernel/bpa_iic.h 1969-12-31 19:00:00.000000000 -0500 @@ -1,62 +0,0 @@ -#ifndef ASM_BPA_IIC_H -#define ASM_BPA_IIC_H -#ifdef __KERNEL__ -/* - * Mapping of IIC pending bits into per-node - * interrupt numbers. - * - * IRQ FF CC SS PP FF CC SS PP Description - * - * 00-3f 80 02 +0 00 - 80 02 +0 3f South Bridge - * 00-3f 80 02 +b 00 - 80 02 +b 3f South Bridge - * 41-4a 80 00 +1 ** - 80 00 +a ** SPU Class 0 - * 51-5a 80 01 +1 ** - 80 01 +a ** SPU Class 1 - * 61-6a 80 02 +1 ** - 80 02 +a ** SPU Class 2 - * 70-7f C0 ** ** 00 - C0 ** ** 0f IPI - * - * F flags - * C class - * S source - * P Priority - * + node number - * * don't care - * - * A node consists of a Broadband Engine and an optional - * south bridge device providing a maximum of 64 IRQs. - * The south bridge may be connected to either IOIF0 - * or IOIF1. - * Each SPE is represented as three IRQ lines, one per - * interrupt class. - * 16 IRQ numbers are reserved for inter processor - * interruptions, although these are only used in the - * range of the first node. - * - * This scheme needs 128 IRQ numbers per BIF node ID, - * which means that with the total of 512 lines - * available, we can have a maximum of four nodes. - */ - -enum { - IIC_EXT_OFFSET = 0x00, /* Start of south bridge IRQs */ - IIC_NUM_EXT = 0x40, /* Number of south bridge IRQs */ - IIC_SPE_OFFSET = 0x40, /* Start of SPE interrupts */ - IIC_CLASS_STRIDE = 0x10, /* SPE IRQs per class */ - IIC_IPI_OFFSET = 0x70, /* Start of IPI IRQs */ - IIC_NUM_IPIS = 0x10, /* IRQs reserved for IPI */ - IIC_NODE_STRIDE = 0x80, /* Total IRQs per node */ -}; - -extern void iic_init_IRQ(void); -extern int iic_get_irq(struct pt_regs *regs); -extern void iic_cause_IPI(int cpu, int mesg); -extern void iic_request_IPIs(void); -extern void iic_setup_cpu(void); -extern void iic_local_enable(void); -extern void iic_local_disable(void); - - -extern void spider_init_IRQ(void); -extern int spider_get_irq(unsigned long int_pending); - -#endif -#endif /* ASM_BPA_IIC_H */ --- linux-cg.orig/arch/ppc64/kernel/bpa_iommu.c 2005-09-01 02:37:40.910978664 -0400 +++ linux-cg/arch/ppc64/kernel/bpa_iommu.c 1969-12-31 19:00:00.000000000 -0500 @@ -1,377 +0,0 @@ -/* - * IOMMU implementation for Broadband Processor Architecture - * We just establish a linear mapping at boot by setting all the - * IOPT cache entries in the CPU. - * The mapping functions should be identical to pci_direct_iommu, - * except for the handling of the high order bit that is required - * by the Spider bridge. These should be split into a separate - * file at the point where we get a different bridge chip. - * - * Copyright (C) 2005 IBM Deutschland Entwicklung GmbH, - * Arnd Bergmann - * - * Based on linear mapping - * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#undef DEBUG - -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "pci.h" -#include "bpa_iommu.h" - -static inline unsigned long -get_iopt_entry(unsigned long real_address, unsigned long ioid, - unsigned long prot) -{ - return (prot & IOPT_PROT_MASK) - | (IOPT_COHERENT) - | (IOPT_ORDER_VC) - | (real_address & IOPT_RPN_MASK) - | (ioid & IOPT_IOID_MASK); -} - -typedef struct { - unsigned long val; -} ioste; - -static inline ioste -mk_ioste(unsigned long val) -{ - ioste ioste = { .val = val, }; - return ioste; -} - -static inline ioste -get_iost_entry(unsigned long iopt_base, unsigned long io_address, unsigned page_size) -{ - unsigned long ps; - unsigned long iostep; - unsigned long nnpt; - unsigned long shift; - - switch (page_size) { - case 0x1000000: - ps = IOST_PS_16M; - nnpt = 0; /* one page per segment */ - shift = 5; /* segment has 16 iopt entries */ - break; - - case 0x100000: - ps = IOST_PS_1M; - nnpt = 0; /* one page per segment */ - shift = 1; /* segment has 256 iopt entries */ - break; - - case 0x10000: - ps = IOST_PS_64K; - nnpt = 0x07; /* 8 pages per io page table */ - shift = 0; /* all entries are used */ - break; - - case 0x1000: - ps = IOST_PS_4K; - nnpt = 0x7f; /* 128 pages per io page table */ - shift = 0; /* all entries are used */ - break; - - default: /* not a known compile time constant */ - BUILD_BUG_ON(1); - break; - } - - iostep = iopt_base + - /* need 8 bytes per iopte */ - (((io_address / page_size * 8) - /* align io page tables on 4k page boundaries */ - << shift) - /* nnpt+1 pages go into each iopt */ - & ~(nnpt << 12)); - - nnpt++; /* this seems to work, but the documentation is not clear - about wether we put nnpt or nnpt-1 into the ioste bits. - In theory, this can't work for 4k pages. */ - return mk_ioste(IOST_VALID_MASK - | (iostep & IOST_PT_BASE_MASK) - | ((nnpt << 5) & IOST_NNPT_MASK) - | (ps & IOST_PS_MASK)); -} - -/* compute the address of an io pte */ -static inline unsigned long -get_ioptep(ioste iost_entry, unsigned long io_address) -{ - unsigned long iopt_base; - unsigned long page_size; - unsigned long page_number; - unsigned long iopt_offset; - - iopt_base = iost_entry.val & IOST_PT_BASE_MASK; - page_size = iost_entry.val & IOST_PS_MASK; - - /* decode page size to compute page number */ - page_number = (io_address & 0x0fffffff) >> (10 + 2 * page_size); - /* page number is an offset into the io page table */ - iopt_offset = (page_number << 3) & 0x7fff8ul; - return iopt_base + iopt_offset; -} - -/* compute the tag field of the iopt cache entry */ -static inline unsigned long -get_ioc_tag(ioste iost_entry, unsigned long io_address) -{ - unsigned long iopte = get_ioptep(iost_entry, io_address); - - return IOPT_VALID_MASK - | ((iopte & 0x00000000000000ff8ul) >> 3) - | ((iopte & 0x0000003fffffc0000ul) >> 9); -} - -/* compute the hashed 6 bit index for the 4-way associative pte cache */ -static inline unsigned long -get_ioc_hash(ioste iost_entry, unsigned long io_address) -{ - unsigned long iopte = get_ioptep(iost_entry, io_address); - - return ((iopte & 0x000000000000001f8ul) >> 3) - ^ ((iopte & 0x00000000000020000ul) >> 17) - ^ ((iopte & 0x00000000000010000ul) >> 15) - ^ ((iopte & 0x00000000000008000ul) >> 13) - ^ ((iopte & 0x00000000000004000ul) >> 11) - ^ ((iopte & 0x00000000000002000ul) >> 9) - ^ ((iopte & 0x00000000000001000ul) >> 7); -} - -/* same as above, but pretend that we have a simpler 1-way associative - pte cache with an 8 bit index */ -static inline unsigned long -get_ioc_hash_1way(ioste iost_entry, unsigned long io_address) -{ - unsigned long iopte = get_ioptep(iost_entry, io_address); - - return ((iopte & 0x000000000000001f8ul) >> 3) - ^ ((iopte & 0x00000000000020000ul) >> 17) - ^ ((iopte & 0x00000000000010000ul) >> 15) - ^ ((iopte & 0x00000000000008000ul) >> 13) - ^ ((iopte & 0x00000000000004000ul) >> 11) - ^ ((iopte & 0x00000000000002000ul) >> 9) - ^ ((iopte & 0x00000000000001000ul) >> 7) - ^ ((iopte & 0x0000000000000c000ul) >> 8); -} - -static inline ioste -get_iost_cache(void __iomem *base, unsigned long index) -{ - unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR); - return mk_ioste(in_be64(&p[index])); -} - -static inline void -set_iost_cache(void __iomem *base, unsigned long index, ioste ste) -{ - unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR); - pr_debug("ioste %02lx was %016lx, store %016lx", index, - get_iost_cache(base, index).val, ste.val); - out_be64(&p[index], ste.val); - pr_debug(" now %016lx\n", get_iost_cache(base, index).val); -} - -static inline unsigned long -get_iopt_cache(void __iomem *base, unsigned long index, unsigned long *tag) -{ - unsigned long __iomem *tags = (void *)(base + IOC_PT_CACHE_DIR); - unsigned long __iomem *p = (void *)(base + IOC_PT_CACHE_REG); - - *tag = tags[index]; - rmb(); - return *p; -} - -static inline void -set_iopt_cache(void __iomem *base, unsigned long index, - unsigned long tag, unsigned long val) -{ - unsigned long __iomem *tags = base + IOC_PT_CACHE_DIR; - unsigned long __iomem *p = base + IOC_PT_CACHE_REG; - pr_debug("iopt %02lx was v%016lx/t%016lx, store v%016lx/t%016lx\n", - index, get_iopt_cache(base, index, &oldtag), oldtag, val, tag); - - out_be64(p, val); - out_be64(&tags[index], tag); -} - -static inline void -set_iost_origin(void __iomem *base) -{ - unsigned long __iomem *p = base + IOC_ST_ORIGIN; - unsigned long origin = IOSTO_ENABLE | IOSTO_SW; - - pr_debug("iost_origin %016lx, now %016lx\n", in_be64(p), origin); - out_be64(p, origin); -} - -static inline void -set_iocmd_config(void __iomem *base) -{ - unsigned long __iomem *p = base + 0xc00; - unsigned long conf; - - conf = in_be64(p); - pr_debug("iost_conf %016lx, now %016lx\n", conf, conf | IOCMD_CONF_TE); - out_be64(p, conf | IOCMD_CONF_TE); -} - -/* FIXME: get these from the device tree */ -#define ioc_base 0x20000511000ull -#define ioc_mmio_base 0x20000510000ull -#define ioid 0x48a -#define iopt_phys_offset (- 0x20000000) /* We have a 512MB offset from the SB */ -#define io_page_size 0x1000000 - -static unsigned long map_iopt_entry(unsigned long address) -{ - switch (address >> 20) { - case 0x600: - address = 0x24020000000ull; /* spider i/o */ - break; - default: - address += iopt_phys_offset; - break; - } - - return get_iopt_entry(address, ioid, IOPT_PROT_RW); -} - -static void iommu_bus_setup_null(struct pci_bus *b) { } -static void iommu_dev_setup_null(struct pci_dev *d) { } - -/* initialize the iommu to support a simple linear mapping - * for each DMA window used by any device. For now, we - * happen to know that there is only one DMA window in use, - * starting at iopt_phys_offset. */ -static void bpa_map_iommu(void) -{ - unsigned long address; - void __iomem *base; - ioste ioste; - unsigned long index; - - base = __ioremap(ioc_base, 0x1000, _PAGE_NO_CACHE); - pr_debug("%lx mapped to %p\n", ioc_base, base); - set_iocmd_config(base); - iounmap(base); - - base = __ioremap(ioc_mmio_base, 0x1000, _PAGE_NO_CACHE); - pr_debug("%lx mapped to %p\n", ioc_mmio_base, base); - - set_iost_origin(base); - - for (address = 0; address < 0x100000000ul; address += io_page_size) { - ioste = get_iost_entry(0x10000000000ul, address, io_page_size); - if ((address & 0xfffffff) == 0) /* segment start */ - set_iost_cache(base, address >> 28, ioste); - index = get_ioc_hash_1way(ioste, address); - pr_debug("addr %08lx, index %02lx, ioste %016lx\n", - address, index, ioste.val); - set_iopt_cache(base, - get_ioc_hash_1way(ioste, address), - get_ioc_tag(ioste, address), - map_iopt_entry(address)); - } - iounmap(base); -} - - -static void *bpa_alloc_coherent(struct device *hwdev, size_t size, - dma_addr_t *dma_handle, unsigned int __nocast flag) -{ - void *ret; - - ret = (void *)__get_free_pages(flag, get_order(size)); - if (ret != NULL) { - memset(ret, 0, size); - *dma_handle = virt_to_abs(ret) | BPA_DMA_VALID; - } - return ret; -} - -static void bpa_free_coherent(struct device *hwdev, size_t size, - void *vaddr, dma_addr_t dma_handle) -{ - free_pages((unsigned long)vaddr, get_order(size)); -} - -static dma_addr_t bpa_map_single(struct device *hwdev, void *ptr, - size_t size, enum dma_data_direction direction) -{ - return virt_to_abs(ptr) | BPA_DMA_VALID; -} - -static void bpa_unmap_single(struct device *hwdev, dma_addr_t dma_addr, - size_t size, enum dma_data_direction direction) -{ -} - -static int bpa_map_sg(struct device *hwdev, struct scatterlist *sg, - int nents, enum dma_data_direction direction) -{ - int i; - - for (i = 0; i < nents; i++, sg++) { - sg->dma_address = (page_to_phys(sg->page) + sg->offset) - | BPA_DMA_VALID; - sg->dma_length = sg->length; - } - - return nents; -} - -static void bpa_unmap_sg(struct device *hwdev, struct scatterlist *sg, - int nents, enum dma_data_direction direction) -{ -} - -static int bpa_dma_supported(struct device *dev, u64 mask) -{ - return mask < 0x100000000ull; -} - -void bpa_init_iommu(void) -{ - bpa_map_iommu(); - - /* Direct I/O, IOMMU off */ - ppc_md.iommu_dev_setup = iommu_dev_setup_null; - ppc_md.iommu_bus_setup = iommu_bus_setup_null; - - pci_dma_ops.alloc_coherent = bpa_alloc_coherent; - pci_dma_ops.free_coherent = bpa_free_coherent; - pci_dma_ops.map_single = bpa_map_single; - pci_dma_ops.unmap_single = bpa_unmap_single; - pci_dma_ops.map_sg = bpa_map_sg; - pci_dma_ops.unmap_sg = bpa_unmap_sg; - pci_dma_ops.dma_supported = bpa_dma_supported; -} --- linux-cg.orig/arch/ppc64/kernel/bpa_iommu.h 2005-09-01 02:37:40.912978360 -0400 +++ linux-cg/arch/ppc64/kernel/bpa_iommu.h 1969-12-31 19:00:00.000000000 -0500 @@ -1,65 +0,0 @@ -#ifndef BPA_IOMMU_H -#define BPA_IOMMU_H - -/* some constants */ -enum { - /* segment table entries */ - IOST_VALID_MASK = 0x8000000000000000ul, - IOST_TAG_MASK = 0x3000000000000000ul, - IOST_PT_BASE_MASK = 0x000003fffffff000ul, - IOST_NNPT_MASK = 0x0000000000000fe0ul, - IOST_PS_MASK = 0x000000000000000ful, - - IOST_PS_4K = 0x1, - IOST_PS_64K = 0x3, - IOST_PS_1M = 0x5, - IOST_PS_16M = 0x7, - - /* iopt tag register */ - IOPT_VALID_MASK = 0x0000000200000000ul, - IOPT_TAG_MASK = 0x00000001fffffffful, - - /* iopt cache register */ - IOPT_PROT_MASK = 0xc000000000000000ul, - IOPT_PROT_NONE = 0x0000000000000000ul, - IOPT_PROT_READ = 0x4000000000000000ul, - IOPT_PROT_WRITE = 0x8000000000000000ul, - IOPT_PROT_RW = 0xc000000000000000ul, - IOPT_COHERENT = 0x2000000000000000ul, - - IOPT_ORDER_MASK = 0x1800000000000000ul, - /* order access to same IOID/VC on same address */ - IOPT_ORDER_ADDR = 0x0800000000000000ul, - /* similar, but only after a write access */ - IOPT_ORDER_WRITES = 0x1000000000000000ul, - /* Order all accesses to same IOID/VC */ - IOPT_ORDER_VC = 0x1800000000000000ul, - - IOPT_RPN_MASK = 0x000003fffffff000ul, - IOPT_HINT_MASK = 0x0000000000000800ul, - IOPT_IOID_MASK = 0x00000000000007fful, - - IOSTO_ENABLE = 0x8000000000000000ul, - IOSTO_ORIGIN = 0x000003fffffff000ul, - IOSTO_HW = 0x0000000000000800ul, - IOSTO_SW = 0x0000000000000400ul, - - IOCMD_CONF_TE = 0x0000800000000000ul, - - /* memory mapped registers */ - IOC_PT_CACHE_DIR = 0x000, - IOC_ST_CACHE_DIR = 0x800, - IOC_PT_CACHE_REG = 0x910, - IOC_ST_ORIGIN = 0x918, - IOC_CONF = 0x930, - - /* The high bit needs to be set on every DMA address, - only 2GB are addressable */ - BPA_DMA_VALID = 0x80000000, - BPA_DMA_MASK = 0x7fffffff, -}; - - -void bpa_init_iommu(void); - -#endif --- linux-cg.orig/arch/ppc64/kernel/bpa_nvram.c 2005-09-01 02:37:40.914978056 -0400 +++ linux-cg/arch/ppc64/kernel/bpa_nvram.c 1969-12-31 19:00:00.000000000 -0500 @@ -1,118 +0,0 @@ -/* - * NVRAM for CPBW - * - * (C) Copyright IBM Corp. 2005 - * - * Authors : Utz Bacher - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2, or (at your option) - * any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include -#include -#include -#include -#include - -#include -#include -#include - -static void __iomem *bpa_nvram_start; -static long bpa_nvram_len; -static spinlock_t bpa_nvram_lock = SPIN_LOCK_UNLOCKED; - -static ssize_t bpa_nvram_read(char *buf, size_t count, loff_t *index) -{ - unsigned long flags; - - if (*index >= bpa_nvram_len) - return 0; - if (*index + count > bpa_nvram_len) - count = bpa_nvram_len - *index; - - spin_lock_irqsave(&bpa_nvram_lock, flags); - - memcpy_fromio(buf, bpa_nvram_start + *index, count); - - spin_unlock_irqrestore(&bpa_nvram_lock, flags); - - *index += count; - return count; -} - -static ssize_t bpa_nvram_write(char *buf, size_t count, loff_t *index) -{ - unsigned long flags; - - if (*index >= bpa_nvram_len) - return 0; - if (*index + count > bpa_nvram_len) - count = bpa_nvram_len - *index; - - spin_lock_irqsave(&bpa_nvram_lock, flags); - - memcpy_toio(bpa_nvram_start + *index, buf, count); - - spin_unlock_irqrestore(&bpa_nvram_lock, flags); - - *index += count; - return count; -} - -static ssize_t bpa_nvram_get_size(void) -{ - return bpa_nvram_len; -} - -int __init bpa_nvram_init(void) -{ - struct device_node *nvram_node; - unsigned long *buffer; - int proplen; - unsigned long nvram_addr; - int ret; - - ret = -ENODEV; - nvram_node = of_find_node_by_type(NULL, "nvram"); - if (!nvram_node) - goto out; - - ret = -EIO; - buffer = (unsigned long *)get_property(nvram_node, "reg", &proplen); - if (proplen != 2*sizeof(unsigned long)) - goto out; - - ret = -ENODEV; - nvram_addr = buffer[0]; - bpa_nvram_len = buffer[1]; - if ( (!bpa_nvram_len) || (!nvram_addr) ) - goto out; - - bpa_nvram_start = ioremap(nvram_addr, bpa_nvram_len); - if (!bpa_nvram_start) - goto out; - - printk(KERN_INFO "BPA NVRAM, %luk mapped to %p\n", - bpa_nvram_len >> 10, bpa_nvram_start); - - ppc_md.nvram_read = bpa_nvram_read; - ppc_md.nvram_write = bpa_nvram_write; - ppc_md.nvram_size = bpa_nvram_get_size; - -out: - of_node_put(nvram_node); - return ret; -} --- linux-cg.orig/arch/ppc64/kernel/bpa_setup.c 2005-09-01 02:37:40.917977600 -0400 +++ linux-cg/arch/ppc64/kernel/bpa_setup.c 1969-12-31 19:00:00.000000000 -0500 @@ -1,140 +0,0 @@ -/* - * linux/arch/ppc/kernel/bpa_setup.c - * - * Copyright (C) 1995 Linus Torvalds - * Adapted from 'alpha' version by Gary Thomas - * Modified by Cort Dougan (cort at cs.nmt.edu) - * Modified by PPC64 Team, IBM Corp - * Modified by BPA Team, IBM Deutschland Entwicklung GmbH - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ -#undef DEBUG - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "pci.h" -#include "bpa_iic.h" -#include "bpa_iommu.h" - -#ifdef DEBUG -#define DBG(fmt...) udbg_printf(fmt) -#else -#define DBG(fmt...) -#endif - -void bpa_get_cpuinfo(struct seq_file *m) -{ - struct device_node *root; - const char *model = ""; - - root = of_find_node_by_path("/"); - if (root) - model = get_property(root, "model", NULL); - seq_printf(m, "machine\t\t: BPA %s\n", model); - of_node_put(root); -} - -static void bpa_progress(char *s, unsigned short hex) -{ - printk("*** %04x : %s\n", hex, s ? s : ""); -} - -static void __init bpa_setup_arch(void) -{ - ppc_md.init_IRQ = iic_init_IRQ; - ppc_md.get_irq = iic_get_irq; - -#ifdef CONFIG_SMP - smp_init_pSeries(); -#endif - - /* init to some ~sane value until calibrate_delay() runs */ - loops_per_jiffy = 50000000; - - if (ROOT_DEV == 0) { - printk("No ramdisk, default root is /dev/hda2\n"); - ROOT_DEV = Root_HDA2; - } - - /* Find and initialize PCI host bridges */ - init_pci_config_tokens(); - find_and_init_phbs(); - spider_init_IRQ(); -#ifdef CONFIG_DUMMY_CONSOLE - conswitchp = &dummy_con; -#endif - - bpa_nvram_init(); -} - -/* - * Early initialization. Relocation is on but do not reference unbolted pages - */ -static void __init bpa_init_early(void) -{ - DBG(" -> bpa_init_early()\n"); - - hpte_init_native(); - - bpa_init_iommu(); - - ppc64_interrupt_controller = IC_BPA_IIC; - - DBG(" <- bpa_init_early()\n"); -} - - -static int __init bpa_probe(int platform) -{ - if (platform != PLATFORM_BPA) - return 0; - - return 1; -} - -struct machdep_calls __initdata bpa_md = { - .probe = bpa_probe, - .setup_arch = bpa_setup_arch, - .init_early = bpa_init_early, - .get_cpuinfo = bpa_get_cpuinfo, - .restart = rtas_restart, - .power_off = rtas_power_off, - .halt = rtas_halt, - .get_boot_time = rtas_get_boot_time, - .get_rtc_time = rtas_get_rtc_time, - .set_rtc_time = rtas_set_rtc_time, - .calibrate_decr = generic_calibrate_decr, - .progress = bpa_progress, -}; --- linux-cg.orig/arch/ppc64/kernel/cpu_setup_power4.S 2005-09-01 02:37:44.101892936 -0400 +++ linux-cg/arch/ppc64/kernel/cpu_setup_power4.S 2005-09-01 02:37:49.128927440 -0400 @@ -77,7 +77,7 @@ _GLOBAL(__970_cpu_preinit) _GLOBAL(__setup_cpu_power4) blr -_GLOBAL(__setup_cpu_be) +_GLOBAL(__setup_cpu_cbe) /* Set large page sizes LP=0: 16MB, LP=1: 64KB */ addi r3, 0, 0 ori r3, r3, HID6_LB --- linux-cg.orig/arch/ppc64/kernel/cputable.c 2005-09-01 02:37:40.919977296 -0400 +++ linux-cg/arch/ppc64/kernel/cputable.c 2005-09-01 02:37:46.088990216 -0400 @@ -34,7 +34,7 @@ EXPORT_SYMBOL(cur_cpu_spec); extern void __setup_cpu_power3(unsigned long offset, struct cpu_spec* spec); extern void __setup_cpu_power4(unsigned long offset, struct cpu_spec* spec); extern void __setup_cpu_ppc970(unsigned long offset, struct cpu_spec* spec); -extern void __setup_cpu_be(unsigned long offset, struct cpu_spec* spec); +extern void __setup_cpu_cbe(unsigned long offset, struct cpu_spec* spec); /* We only set the altivec features if the kernel was compiled with altivec @@ -218,7 +218,7 @@ struct cpu_spec cpu_specs[] = { { /* BE DD1.x */ .pvr_mask = 0xffff0000, .pvr_value = 0x00700000, - .cpu_name = "Broadband Engine", + .cpu_name = "Cell Broadband Engine", .cpu_features = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_ALTIVEC_COMP | @@ -227,7 +227,7 @@ struct cpu_spec cpu_specs[] = { PPC_FEATURE_HAS_ALTIVEC_COMP, .icache_bsize = 128, .dcache_bsize = 128, - .cpu_setup = __setup_cpu_be, + .cpu_setup = __setup_cpu_cbe, }, { /* default match */ .pvr_mask = 0x00000000, --- linux-cg.orig/arch/ppc64/kernel/irq.c 2005-09-01 02:37:40.921976992 -0400 +++ linux-cg/arch/ppc64/kernel/irq.c 2005-09-01 02:37:46.089990064 -0400 @@ -392,7 +392,7 @@ int virt_irq_create_mapping(unsigned int if (ppc64_interrupt_controller == IC_OPEN_PIC) return real_irq; /* no mapping for openpic (for now) */ - if (ppc64_interrupt_controller == IC_BPA_IIC) + if (ppc64_interrupt_controller == IC_CELL_PIC) return real_irq; /* no mapping for iic either */ /* don't map interrupts < MIN_VIRT_IRQ */ --- linux-cg.orig/arch/ppc64/kernel/pSeries_smp.c 2005-09-01 02:37:40.924976536 -0400 +++ linux-cg/arch/ppc64/kernel/pSeries_smp.c 2005-09-01 02:37:46.089990064 -0400 @@ -40,6 +40,7 @@ #include #include #include +#include #include #include #include @@ -48,7 +49,6 @@ #include #include "mpic.h" -#include "bpa_iic.h" #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -464,7 +464,7 @@ void __init smp_init_pSeries(void) break; #endif #ifdef CONFIG_BPA_IIC - case IC_BPA_IIC: + case IC_CELL_PIC: smp_ops = &bpa_iic_smp_ops; break; #endif --- linux-cg.orig/arch/ppc64/kernel/setup.c 2005-09-01 02:37:40.926976232 -0400 +++ linux-cg/arch/ppc64/kernel/setup.c 2005-09-01 02:37:46.091989760 -0400 @@ -343,7 +343,7 @@ static void __init setup_cpu_maps(void) extern struct machdep_calls pSeries_md; extern struct machdep_calls pmac_md; extern struct machdep_calls maple_md; -extern struct machdep_calls bpa_md; +extern struct machdep_calls cell_md; /* Ultimately, stuff them in an elf section like initcalls... */ static struct machdep_calls __initdata *machines[] = { @@ -356,9 +356,9 @@ static struct machdep_calls __initdata * #ifdef CONFIG_PPC_MAPLE &maple_md, #endif /* CONFIG_PPC_MAPLE */ -#ifdef CONFIG_PPC_BPA - &bpa_md, -#endif +#ifdef CONFIG_PPC_CELL + &cell_md, +#endif /* CONFIG_PPC_CELL */ NULL }; --- linux-cg.orig/arch/ppc64/kernel/spider-pic.c 2005-09-01 02:37:40.928975928 -0400 +++ linux-cg/arch/ppc64/kernel/spider-pic.c 1969-12-31 19:00:00.000000000 -0500 @@ -1,191 +0,0 @@ -/* - * External Interrupt Controller on Spider South Bridge - * - * (C) Copyright IBM Deutschland Entwicklung GmbH 2005 - * - * Author: Arnd Bergmann - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2, or (at your option) - * any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include -#include - -#include -#include -#include - -#include "bpa_iic.h" - -/* register layout taken from Spider spec, table 7.4-4 */ -enum { - TIR_DEN = 0x004, /* Detection Enable Register */ - TIR_MSK = 0x084, /* Mask Level Register */ - TIR_EDC = 0x0c0, /* Edge Detection Clear Register */ - TIR_PNDA = 0x100, /* Pending Register A */ - TIR_PNDB = 0x104, /* Pending Register B */ - TIR_CS = 0x144, /* Current Status Register */ - TIR_LCSA = 0x150, /* Level Current Status Register A */ - TIR_LCSB = 0x154, /* Level Current Status Register B */ - TIR_LCSC = 0x158, /* Level Current Status Register C */ - TIR_LCSD = 0x15c, /* Level Current Status Register D */ - TIR_CFGA = 0x200, /* Setting Register A0 */ - TIR_CFGB = 0x204, /* Setting Register B0 */ - /* 0x208 ... 0x3ff Setting Register An/Bn */ - TIR_PPNDA = 0x400, /* Packet Pending Register A */ - TIR_PPNDB = 0x404, /* Packet Pending Register B */ - TIR_PIERA = 0x408, /* Packet Output Error Register A */ - TIR_PIERB = 0x40c, /* Packet Output Error Register B */ - TIR_PIEN = 0x444, /* Packet Output Enable Register */ - TIR_PIPND = 0x454, /* Packet Output Pending Register */ - TIRDID = 0x484, /* Spider Device ID Register */ - REISTIM = 0x500, /* Reissue Command Timeout Time Setting */ - REISTIMEN = 0x504, /* Reissue Command Timeout Setting */ - REISWAITEN = 0x508, /* Reissue Wait Control*/ -}; - -static void __iomem *spider_pics[4]; - -static void __iomem *spider_get_pic(int irq) -{ - int node = irq / IIC_NODE_STRIDE; - irq %= IIC_NODE_STRIDE; - - if (irq >= IIC_EXT_OFFSET && - irq < IIC_EXT_OFFSET + IIC_NUM_EXT && - spider_pics) - return spider_pics[node]; - return NULL; -} - -static int spider_get_nr(unsigned int irq) -{ - return (irq % IIC_NODE_STRIDE) - IIC_EXT_OFFSET; -} - -static void __iomem *spider_get_irq_config(int irq) -{ - void __iomem *pic; - pic = spider_get_pic(irq); - return pic + TIR_CFGA + 8 * spider_get_nr(irq); -} - -static void spider_enable_irq(unsigned int irq) -{ - void __iomem *cfg = spider_get_irq_config(irq); - irq = spider_get_nr(irq); - - out_be32(cfg, in_be32(cfg) | 0x3107000eu); - out_be32(cfg + 4, in_be32(cfg + 4) | 0x00020000u | irq); -} - -static void spider_disable_irq(unsigned int irq) -{ - void __iomem *cfg = spider_get_irq_config(irq); - irq = spider_get_nr(irq); - - out_be32(cfg, in_be32(cfg) & ~0x30000000u); -} - -static unsigned int spider_startup_irq(unsigned int irq) -{ - spider_enable_irq(irq); - return 0; -} - -static void spider_shutdown_irq(unsigned int irq) -{ - spider_disable_irq(irq); -} - -static void spider_end_irq(unsigned int irq) -{ - spider_enable_irq(irq); -} - -static void spider_ack_irq(unsigned int irq) -{ - spider_disable_irq(irq); - iic_local_enable(); -} - -static struct hw_interrupt_type spider_pic = { - .typename = " SPIDER ", - .startup = spider_startup_irq, - .shutdown = spider_shutdown_irq, - .enable = spider_enable_irq, - .disable = spider_disable_irq, - .ack = spider_ack_irq, - .end = spider_end_irq, -}; - - -int spider_get_irq(unsigned long int_pending) -{ - void __iomem *regs = spider_get_pic(int_pending); - unsigned long cs; - int irq; - - cs = in_be32(regs + TIR_CS); - - irq = cs >> 24; - if (irq != 63) - return irq; - - return -1; -} - -void spider_init_IRQ(void) -{ - int node; - struct device_node *dn; - unsigned int *property; - long spiderpic; - int n; - -/* FIXME: detect multiple PICs as soon as the device tree has them */ - for (node = 0; node < 1; node++) { - dn = of_find_node_by_path("/"); - n = prom_n_addr_cells(dn); - property = (unsigned int *) get_property(dn, - "platform-spider-pic", NULL); - - if (!property) - continue; - for (spiderpic = 0; n > 0; --n) - spiderpic = (spiderpic << 32) + *property++; - printk(KERN_DEBUG "SPIDER addr: %lx\n", spiderpic); - spider_pics[node] = __ioremap(spiderpic, 0x800, _PAGE_NO_CACHE); - for (n = 0; n < IIC_NUM_EXT; n++) { - int irq = n + IIC_EXT_OFFSET + node * IIC_NODE_STRIDE; - get_irq_desc(irq)->handler = &spider_pic; - - /* do not mask any interrupts because of level */ - out_be32(spider_pics[node] + TIR_MSK, 0x0); - - /* disable edge detection clear */ - /* out_be32(spider_pics[node] + TIR_EDC, 0x0); */ - - /* enable interrupt packets to be output */ - out_be32(spider_pics[node] + TIR_PIEN, - in_be32(spider_pics[node] + TIR_PIEN) | 0x1); - - /* Enable the interrupt detection enable bit. Do this last! */ - out_be32(spider_pics[node] + TIR_DEN, - in_be32(spider_pics[node] +TIR_DEN) | 0x1); - - } - } -} --- linux-cg.orig/arch/ppc64/kernel/traps.c 2005-09-01 02:37:40.931975472 -0400 +++ linux-cg/arch/ppc64/kernel/traps.c 2005-09-01 02:37:46.093989456 -0400 @@ -126,8 +126,8 @@ int die(const char *str, struct pt_regs printk("POWERMAC "); nl = 1; break; - case PLATFORM_BPA: - printk("BPA "); + case PLATFORM_CELL: + printk("CBEA "); nl = 1; break; } --- linux-cg.orig/include/asm-powerpc/cell-pic.h 1969-12-31 19:00:00.000000000 -0500 +++ linux-cg/include/asm-powerpc/cell-pic.h 2005-09-01 02:37:46.093989456 -0400 @@ -0,0 +1,62 @@ +#ifndef __ASM_CELL_PIC_H +#define __ASM_CELL_PIC_H +#ifdef __KERNEL__ +/* + * Mapping of IIC pending bits into per-node + * interrupt numbers. + * + * IRQ FF CC SS PP FF CC SS PP Description + * + * 00-3f 80 02 +0 00 - 80 02 +0 3f South Bridge + * 00-3f 80 02 +b 00 - 80 02 +b 3f South Bridge + * 41-4a 80 00 +1 ** - 80 00 +a ** SPU Class 0 + * 51-5a 80 01 +1 ** - 80 01 +a ** SPU Class 1 + * 61-6a 80 02 +1 ** - 80 02 +a ** SPU Class 2 + * 70-7f C0 ** ** 00 - C0 ** ** 0f IPI + * + * F flags + * C class + * S source + * P Priority + * + node number + * * don't care + * + * A node consists of a Cell Processor and an optional + * south bridge device providing a maximum of 64 IRQs. + * The south bridge may be connected to either IOIF0 + * or IOIF1. + * Each SPE is represented as three IRQ lines, one per + * interrupt class. + * 16 IRQ numbers are reserved for inter processor + * interruptions, although these are only used in the + * range of the first node. + * + * This scheme needs 128 IRQ numbers per BIF node ID, + * which means that with the total of 512 lines + * available, we can have a maximum of four nodes. + */ + +enum { + IIC_EXT_OFFSET = 0x00, /* Start of south bridge IRQs */ + IIC_NUM_EXT = 0x40, /* Number of south bridge IRQs */ + IIC_SPE_OFFSET = 0x40, /* Start of SPE interrupts */ + IIC_CLASS_STRIDE = 0x10, /* SPE IRQs per class */ + IIC_IPI_OFFSET = 0x70, /* Start of IPI IRQs */ + IIC_NUM_IPIS = 0x10, /* IRQs reserved for IPI */ + IIC_NODE_STRIDE = 0x80, /* Total IRQs per node */ +}; + +extern void iic_init_IRQ(void); +extern int iic_get_irq(struct pt_regs *regs); +extern void iic_cause_IPI(int cpu, int mesg); +extern void iic_request_IPIs(void); +extern void iic_setup_cpu(void); +extern void iic_local_enable(void); +extern void iic_local_disable(void); + + +extern void spider_init_IRQ(void); +extern int spider_get_irq(unsigned long int_pending); + +#endif /* __KERNEL__ */ +#endif /* __ASM_CELL_PIC_H */ --- linux-cg.orig/include/asm-ppc64/nvram.h 2005-09-01 02:37:40.935974864 -0400 +++ linux-cg/include/asm-ppc64/nvram.h 2005-09-01 02:37:46.094989304 -0400 @@ -70,7 +70,7 @@ extern struct nvram_partition *nvram_fin extern int pSeries_nvram_init(void); extern int pmac_nvram_init(void); -extern int bpa_nvram_init(void); +extern int cell_nvram_init(void); /* PowerMac specific nvram stuffs */ --- linux-cg.orig/include/asm-ppc64/processor.h 2005-09-01 02:37:40.938974408 -0400 +++ linux-cg/include/asm-ppc64/processor.h 2005-09-01 02:37:46.095989152 -0400 @@ -269,7 +269,7 @@ #define PV_630 0x0040 #define PV_630p 0x0041 #define PV_970MP 0x0044 -#define PV_BE 0x0070 +#define PV_CBE 0x0070 /* Platforms supported by PPC64 */ #define PLATFORM_PSERIES 0x0100 @@ -278,7 +278,8 @@ #define PLATFORM_LPAR 0x0001 #define PLATFORM_POWERMAC 0x0400 #define PLATFORM_MAPLE 0x0500 -#define PLATFORM_BPA 0x1000 +#define PLATFORM_CELL 0x1000 +#define PLATFORM_BPA PLATFORM_CELL /* Compatibility with drivers coming from PPC32 world */ #define _machine (systemcfg->platform) @@ -290,7 +291,7 @@ #define IC_INVALID 0 #define IC_OPEN_PIC 1 #define IC_PPC_XIC 2 -#define IC_BPA_IIC 3 +#define IC_CELL_PIC 3 #define XGLUE(a,b) a##b #define GLUE(a,b) XGLUE(a,b) From michael at ellerman.id.au Thu Sep 1 11:28:53 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:28:53 +1000 (EST) Subject: [PATCH 0/18] Updates & bug fixes for iseries_veth network driver Message-ID: <1125538127.859382.875909607846.qpush@concordia> Hi, This is a series of patches for the iseries_veth driver. Most of these are pretty much unchanged since I posted them earlier: http://www.uwsg.iu.edu/hypermail/linux/kernel/0506.3/1837.html I've added patches to add sysfs support, and do some further code cleanups. Please merge if they look ok. cheers From michael at ellerman.id.au Thu Sep 1 11:28:57 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:28:57 +1000 (EST) Subject: [PATCH 1/18] iseries_veth: Cleanup error and debug messages In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012857.38EFD681EA@ozlabs.org> Currently the iseries_veth driver prints the file name and line number in its error messages. This isn't very useful for most users, so just print "iseries_veth: message" instead. - convert uses of veth_printk() to veth_debug()/veth_error()/veth_info() - make terminology consistent, ie. always refer to LPAR not lpar - be consistent about printing return codes as %d not %x - make format strings fit in 80 columns Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 98 +++++++++++++++++++++++---------------------- 1 files changed, 51 insertions(+), 47 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -79,6 +79,8 @@ #include #include +#undef DEBUG + #include "iseries_veth.h" MODULE_AUTHOR("Kyle Lucke "); @@ -176,11 +178,18 @@ static void veth_timed_ack(unsigned long * Utility functions */ -#define veth_printk(prio, fmt, args...) \ - printk(prio "%s: " fmt, __FILE__, ## args) +#define veth_info(fmt, args...) \ + printk(KERN_INFO "iseries_veth: " fmt, ## args) #define veth_error(fmt, args...) \ - printk(KERN_ERR "(%s:%3.3d) ERROR: " fmt, __FILE__, __LINE__ , ## args) + printk(KERN_ERR "iseries_veth: Error: " fmt, ## args) + +#ifdef DEBUG +#define veth_debug(fmt, args...) \ + printk(KERN_DEBUG "iseries_veth: " fmt, ## args) +#else +#define veth_debug(fmt, args...) do {} while (0) +#endif static inline void veth_stack_push(struct veth_lpar_connection *cnx, struct veth_msg *msg) @@ -278,7 +287,7 @@ static void veth_take_cap(struct veth_lp HvLpEvent_Type_VirtualLan); if (cnx->state & VETH_STATE_GOTCAPS) { - veth_error("Received a second capabilities from lpar %d\n", + veth_error("Received a second capabilities from LPAR %d.\n", cnx->remote_lp); event->base_event.xRc = HvLpEvent_Rc_BufferNotAvailable; HvCallEvent_ackLpEvent((struct HvLpEvent *) event); @@ -297,7 +306,7 @@ static void veth_take_cap_ack(struct vet spin_lock_irqsave(&cnx->lock, flags); if (cnx->state & VETH_STATE_GOTCAPACK) { - veth_error("Received a second capabilities ack from lpar %d\n", + veth_error("Received a second capabilities ack from LPAR %d.\n", cnx->remote_lp); } else { memcpy(&cnx->cap_ack_event, event, @@ -314,8 +323,7 @@ static void veth_take_monitor_ack(struct unsigned long flags; spin_lock_irqsave(&cnx->lock, flags); - veth_printk(KERN_DEBUG, "Monitor ack returned for lpar %d\n", - cnx->remote_lp); + veth_debug("cnx %d: lost connection.\n", cnx->remote_lp); cnx->state |= VETH_STATE_RESET; veth_kick_statemachine(cnx); spin_unlock_irqrestore(&cnx->lock, flags); @@ -336,8 +344,8 @@ static void veth_handle_ack(struct VethL veth_take_monitor_ack(cnx, event); break; default: - veth_error("Unknown ack type %d from lpar %d\n", - event->base_event.xSubtype, rlp); + veth_error("Unknown ack type %d from LPAR %d.\n", + event->base_event.xSubtype, rlp); }; } @@ -373,8 +381,8 @@ static void veth_handle_int(struct VethL veth_receive(cnx, event); break; default: - veth_error("Unknown interrupt type %d from lpar %d\n", - event->base_event.xSubtype, rlp); + veth_error("Unknown interrupt type %d from LPAR %d.\n", + event->base_event.xSubtype, rlp); }; } @@ -400,8 +408,8 @@ static int veth_process_caps(struct veth || (remote_caps->ack_threshold > VETH_MAX_ACKS_PER_MSG) || (remote_caps->ack_threshold == 0) || (cnx->ack_timeout == 0) ) { - veth_error("Received incompatible capabilities from lpar %d\n", - cnx->remote_lp); + veth_error("Received incompatible capabilities from LPAR %d.\n", + cnx->remote_lp); return HvLpEvent_Rc_InvalidSubtypeData; } @@ -418,8 +426,8 @@ static int veth_process_caps(struct veth cnx->num_ack_events += num; if (cnx->num_ack_events < num_acks_needed) { - veth_error("Couldn't allocate enough ack events for lpar %d\n", - cnx->remote_lp); + veth_error("Couldn't allocate enough ack events " + "for LPAR %d.\n", cnx->remote_lp); return HvLpEvent_Rc_BufferNotAvailable; } @@ -498,9 +506,8 @@ static void veth_statemachine(void *p) } else { if ( (rc != HvLpEvent_Rc_PartitionDead) && (rc != HvLpEvent_Rc_PathClosed) ) - veth_error("Error sending monitor to " - "lpar %d, rc=%x\n", - rlp, (int) rc); + veth_error("Error sending monitor to LPAR %d, " + "rc = %d\n", rlp, rc); /* Oh well, hope we get a cap from the other * end and do better when that kicks us */ @@ -523,9 +530,9 @@ static void veth_statemachine(void *p) } else { if ( (rc != HvLpEvent_Rc_PartitionDead) && (rc != HvLpEvent_Rc_PathClosed) ) - veth_error("Error sending caps to " - "lpar %d, rc=%x\n", - rlp, (int) rc); + veth_error("Error sending caps to LPAR %d, " + "rc = %d\n", rlp, rc); + /* Oh well, hope we get a cap from the other * end and do better when that kicks us */ goto out; @@ -565,10 +572,8 @@ static void veth_statemachine(void *p) add_timer(&cnx->ack_timer); cnx->state |= VETH_STATE_READY; } else { - veth_printk(KERN_ERR, "Caps rejected (rc=%d) by " - "lpar %d\n", - cnx->cap_ack_event.base_event.xRc, - rlp); + veth_error("Caps rejected by LPAR %d, rc = %d\n", + rlp, cnx->cap_ack_event.base_event.xRc); goto cant_cope; } } @@ -581,8 +586,8 @@ static void veth_statemachine(void *p) /* FIXME: we get here if something happens we really can't * cope with. The link will never work once we get here, and * all we can do is not lock the rest of the system up */ - veth_error("Badness on connection to lpar %d (state=%04lx) " - " - shutting down\n", rlp, cnx->state); + veth_error("Unrecoverable error on connection to LPAR %d, shutting down" + " (state = 0x%04lx)\n", rlp, cnx->state); cnx->state |= VETH_STATE_SHUTDOWN; spin_unlock_irq(&cnx->lock); } @@ -614,7 +619,7 @@ static int veth_init_connection(u8 rlp) msgs = kmalloc(VETH_NUMBUFFERS * sizeof(struct veth_msg), GFP_KERNEL); if (! msgs) { - veth_error("Can't allocate buffers for lpar %d\n", rlp); + veth_error("Can't allocate buffers for LPAR %d.\n", rlp); return -ENOMEM; } @@ -630,8 +635,7 @@ static int veth_init_connection(u8 rlp) cnx->num_events = veth_allocate_events(rlp, 2 + VETH_NUMBUFFERS); if (cnx->num_events < (2 + VETH_NUMBUFFERS)) { - veth_error("Can't allocate events for lpar %d, only got %d\n", - rlp, cnx->num_events); + veth_error("Can't allocate enough events for LPAR %d.\n", rlp); return -ENOMEM; } @@ -889,15 +893,13 @@ static struct net_device * __init veth_p rc = register_netdev(dev); if (rc != 0) { - veth_printk(KERN_ERR, - "Failed to register ethernet device for vlan %d\n", - vlan); + veth_error("Failed registering net device for vlan%d.\n", vlan); free_netdev(dev); return NULL; } - veth_printk(KERN_DEBUG, "%s attached to iSeries vlan %d (lpar_map=0x%04x)\n", - dev->name, vlan, port->lpar_map); + veth_info("%s attached to iSeries vlan %d (LPAR map = 0x%.4X)\n", + dev->name, vlan, port->lpar_map); return dev; } @@ -1030,7 +1032,7 @@ static int veth_start_xmit(struct sk_buf dev_kfree_skb(skb); } else { if (port->pending_skb) { - veth_error("%s: Tx while skb was pending!\n", + veth_error("%s: TX while skb was pending!\n", dev->name); dev_kfree_skb(skb); spin_unlock_irqrestore(&port->pending_gate, flags); @@ -1066,10 +1068,10 @@ static void veth_recycle_msg(struct veth memset(&msg->data, 0, sizeof(msg->data)); veth_stack_push(cnx, msg); - } else - if (cnx->state & VETH_STATE_OPEN) - veth_error("Bogus frames ack from lpar %d (#%d)\n", - cnx->remote_lp, msg->token); + } else if (cnx->state & VETH_STATE_OPEN) { + veth_error("Non-pending frame (# %d) acked by LPAR %d.\n", + cnx->remote_lp, msg->token); + } } static void veth_flush_pending(struct veth_lpar_connection *cnx) @@ -1179,8 +1181,8 @@ static void veth_flush_acks(struct veth_ 0, &cnx->pending_acks); if (rc != HvLpEvent_Rc_Good) - veth_error("Error 0x%x acking frames from lpar %d!\n", - (unsigned)rc, cnx->remote_lp); + veth_error("Failed acking frames from LPAR %d, rc = %d\n", + cnx->remote_lp, (int)rc); cnx->num_pending_acks = 0; memset(&cnx->pending_acks, 0xff, sizeof(cnx->pending_acks)); @@ -1216,9 +1218,10 @@ static void veth_receive(struct veth_lpa /* make sure that we have at least 1 EOF entry in the * remaining entries */ if (! (senddata->eofmask >> (startchunk + VETH_EOF_SHIFT))) { - veth_error("missing EOF frag in event " - "eofmask=0x%x startchunk=%d\n", - (unsigned) senddata->eofmask, startchunk); + veth_error("Missing EOF fragment in event " + "eofmask = 0x%x startchunk = %d\n", + (unsigned)senddata->eofmask, + startchunk); break; } @@ -1237,8 +1240,9 @@ static void veth_receive(struct veth_lpa /* nchunks == # of chunks in this frame */ if ((length - ETH_HLEN) > VETH_MAX_MTU) { - veth_error("Received oversize frame from lpar %d " - "(length=%d)\n", cnx->remote_lp, length); + veth_error("Received oversize frame from LPAR %d " + "(length = %d)\n", + cnx->remote_lp, length); continue; } From michael at ellerman.id.au Thu Sep 1 11:28:59 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:28:59 +1000 (EST) Subject: [PATCH 2/18] iseries_veth: Remove a FIXME WRT deletion of the ack_timer In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012859.BDDC3681F5@ozlabs.org> The iseries_veth driver has a timer which we use to send acks. When the connection is reset or stopped we need to delete the timer. Currently we only call del_timer() when resetting a connection, which means the timer might run again while the connection is being re-setup. As it turns out that's ok, because the flags the timer consults have been reset. It's cleaner though to call del_timer_sync() once we've dropped the lock, although the timer may still run between us dropping the lock and calling del_timer_sync(), but as above that's ok. Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 21 +++++++++++++-------- 1 files changed, 13 insertions(+), 8 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -450,13 +450,15 @@ static void veth_statemachine(void *p) if (cnx->state & VETH_STATE_RESET) { int i; - del_timer(&cnx->ack_timer); - if (cnx->state & VETH_STATE_OPEN) HvCallEvent_closeLpEventPath(cnx->remote_lp, HvLpEvent_Type_VirtualLan); - /* reset ack data */ + /* + * Reset ack data. This prevents the ack_timer actually + * doing anything, even if it runs one more time when + * we drop the lock below. + */ memset(&cnx->pending_acks, 0xff, sizeof (cnx->pending_acks)); cnx->num_pending_acks = 0; @@ -469,9 +471,16 @@ static void veth_statemachine(void *p) if (cnx->msgs) for (i = 0; i < VETH_NUMBUFFERS; ++i) veth_recycle_msg(cnx, cnx->msgs + i); + + /* Drop the lock so we can do stuff that might sleep or + * take other locks. */ spin_unlock_irq(&cnx->lock); + + del_timer_sync(&cnx->ack_timer); veth_flush_pending(cnx); + spin_lock_irq(&cnx->lock); + if (cnx->state & VETH_STATE_RESET) goto restart; } @@ -658,13 +667,9 @@ static void veth_stop_connection(u8 rlp) veth_kick_statemachine(cnx); spin_unlock_irq(&cnx->lock); + /* Wait for the state machine to run. */ flush_scheduled_work(); - /* FIXME: not sure if this is necessary - will already have - * been deleted by the state machine, just want to make sure - * its not running any more */ - del_timer_sync(&cnx->ack_timer); - if (cnx->num_events > 0) mf_deallocate_lp_events(cnx->remote_lp, HvLpEvent_Type_VirtualLan, From michael at ellerman.id.au Thu Sep 1 11:29:00 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:29:00 +1000 (EST) Subject: [PATCH 3/18] iseries_veth: Try to avoid pathological reset behaviour In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012900.142D5681F7@ozlabs.org> The iseries_veth driver contains a state machine which is used to manage how connections are setup and neogotiated between LPARs. If one side of a connection resets for some reason, the two LPARs can get stuck in a race to re-setup the connection. This can lead to the connection being declared dead by one or both ends. In practice the connection is declared dead by one or both ends approximately 8/10 times a connection is reset, although it is rare for connections to be reset. (an example here: http://michael.ellerman.id.au/files/misc/veth-trace.html) The core of the problem is that the end that resets the connection doesn't wait for the other end to become aware of the reset. So the resetting end starts setting the connection back up, and then receives a reset from the other end (which is the response to the initial reset). And so on. We're severely limited in what we can do to fix this. The protocol between LPARs is essentially fixed, as we have to interoperate with both OS/400 and old Linux drivers. Which also means we need a fix that only changes the code on one end. The only fix I've found given that, is to just blindly sleep for a bit when resetting the connection, in the hope that the other end will get itself sorted. Needless to say I'd love it if someone has a better idea. This does work, I've so far been unable to get it to break, whereas without the fix a reset of one end will lead to a dead connection ~8/10 times. Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 25 +++++++++++++++++++++++-- 1 files changed, 23 insertions(+), 2 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -324,8 +324,14 @@ static void veth_take_monitor_ack(struct spin_lock_irqsave(&cnx->lock, flags); veth_debug("cnx %d: lost connection.\n", cnx->remote_lp); - cnx->state |= VETH_STATE_RESET; - veth_kick_statemachine(cnx); + + /* Avoid kicking the statemachine once we're shutdown. + * It's unnecessary and it could break veth_stop_connection(). */ + + if (! (cnx->state & VETH_STATE_SHUTDOWN)) { + cnx->state |= VETH_STATE_RESET; + veth_kick_statemachine(cnx); + } spin_unlock_irqrestore(&cnx->lock, flags); } @@ -483,6 +489,12 @@ static void veth_statemachine(void *p) if (cnx->state & VETH_STATE_RESET) goto restart; + + /* Hack, wait for the other end to reset itself. */ + if (! (cnx->state & VETH_STATE_SHUTDOWN)) { + schedule_delayed_work(&cnx->statemachine_wq, 5 * HZ); + goto out; + } } if (cnx->state & VETH_STATE_SHUTDOWN) @@ -667,6 +679,15 @@ static void veth_stop_connection(u8 rlp) veth_kick_statemachine(cnx); spin_unlock_irq(&cnx->lock); + /* There's a slim chance the reset code has just queued the + * statemachine to run in five seconds. If so we need to cancel + * that and requeue the work to run now. */ + if (cancel_delayed_work(&cnx->statemachine_wq)) { + spin_lock_irq(&cnx->lock); + veth_kick_statemachine(cnx); + spin_unlock_irq(&cnx->lock); + } + /* Wait for the state machine to run. */ flush_scheduled_work(); From michael at ellerman.id.au Thu Sep 1 11:29:02 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:29:02 +1000 (EST) Subject: [PATCH 4/18] iseries_veth: Fix broken promiscuous handling In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012902.89A72681F5@ozlabs.org> Due to a logic bug, once promiscuous mode is enabled in the iseries_veth driver it is never disabled. The driver keeps two flags, promiscuous and all_mcast which have exactly the same effect. This is because we only ever receive packets destined for us, or multicast packets. So consolidate them into one promiscuous flag for simplicity. Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 16 +++++----------- 1 files changed, 5 insertions(+), 11 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -159,7 +159,6 @@ struct veth_port { rwlock_t mcast_gate; int promiscuous; - int all_mcast; int num_mcast; u64 mcast_addr[VETH_MAX_MCAST]; }; @@ -756,17 +755,15 @@ static void veth_set_multicast_list(stru write_lock_irqsave(&port->mcast_gate, flags); - if (dev->flags & IFF_PROMISC) { /* set promiscuous mode */ - printk(KERN_INFO "%s: Promiscuous mode enabled.\n", - dev->name); + if ((dev->flags & IFF_PROMISC) || (dev->flags & IFF_ALLMULTI) || + (dev->mc_count > VETH_MAX_MCAST)) { port->promiscuous = 1; - } else if ( (dev->flags & IFF_ALLMULTI) - || (dev->mc_count > VETH_MAX_MCAST) ) { - port->all_mcast = 1; } else { struct dev_mc_list *dmi = dev->mc_list; int i; + port->promiscuous = 0; + /* Update table */ port->num_mcast = 0; @@ -1145,12 +1142,9 @@ static inline int veth_frame_wanted(stru if ( (mac_addr == port->mac_addr) || (mac_addr == 0xffffffffffff0000) ) return 1; - if (! (((char *) &mac_addr)[0] & 0x01)) - return 0; - read_lock_irqsave(&port->mcast_gate, flags); - if (port->promiscuous || port->all_mcast) { + if (port->promiscuous) { wanted = 1; goto out; } From michael at ellerman.id.au Thu Sep 1 11:29:05 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:29:05 +1000 (EST) Subject: [PATCH 5/18] iseries_veth: Remove redundant message stack lock In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012905.C9D3668205@ozlabs.org> The iseries_veth driver keeps a stack of messages for each connection and a lock to protect the stack. However there is also a per-connection lock which makes the message stack lock redundant. Remove the message stack lock and document the fact that callers of the stack-manipulation functions must hold the connection's lock. Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 12 +++--------- 1 files changed, 3 insertions(+), 9 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -143,7 +143,6 @@ struct veth_lpar_connection { struct VethCapData remote_caps; u32 ack_timeout; - spinlock_t msg_stack_lock; struct veth_msg *msg_stack_head; }; @@ -190,27 +189,23 @@ static void veth_timed_ack(unsigned long #define veth_debug(fmt, args...) do {} while (0) #endif +/* You must hold the connection's lock when you call this function. */ static inline void veth_stack_push(struct veth_lpar_connection *cnx, struct veth_msg *msg) { - unsigned long flags; - - spin_lock_irqsave(&cnx->msg_stack_lock, flags); msg->next = cnx->msg_stack_head; cnx->msg_stack_head = msg; - spin_unlock_irqrestore(&cnx->msg_stack_lock, flags); } +/* You must hold the connection's lock when you call this function. */ static inline struct veth_msg *veth_stack_pop(struct veth_lpar_connection *cnx) { - unsigned long flags; struct veth_msg *msg; - spin_lock_irqsave(&cnx->msg_stack_lock, flags); msg = cnx->msg_stack_head; if (msg) cnx->msg_stack_head = cnx->msg_stack_head->next; - spin_unlock_irqrestore(&cnx->msg_stack_lock, flags); + return msg; } @@ -645,7 +640,6 @@ static int veth_init_connection(u8 rlp) cnx->msgs = msgs; memset(msgs, 0, VETH_NUMBUFFERS * sizeof(struct veth_msg)); - spin_lock_init(&cnx->msg_stack_lock); for (i = 0; i < VETH_NUMBUFFERS; i++) { msgs[i].token = i; From michael at ellerman.id.au Thu Sep 1 11:29:06 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:29:06 +1000 (EST) Subject: [PATCH 6/18] iseries_veth: Replace lock-protected atomic with an ordinary variable In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012906.9EF9568206@ozlabs.org> The iseries_veth driver uses atomic ops to manipulate the in_use field of one of its per-connection structures. However all references to the flag occur while the connection's lock is held, so the atomic ops aren't necessary. Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 13 +++++++------ 1 files changed, 7 insertions(+), 6 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -117,7 +117,7 @@ struct veth_msg { struct veth_msg *next; struct VethFramesData data; int token; - unsigned long in_use; + int in_use; struct sk_buff *skb; struct device *dev; }; @@ -957,6 +957,8 @@ static int veth_transmit_to_one(struct s goto drop; } + msg->in_use = 1; + dma_length = skb->len; dma_address = dma_map_single(port->dev, skb->data, dma_length, DMA_TO_DEVICE); @@ -971,7 +973,6 @@ static int veth_transmit_to_one(struct s msg->data.addr[0] = dma_address; msg->data.len[0] = dma_length; msg->data.eofmask = 1 << VETH_EOF_SHIFT; - set_bit(0, &(msg->in_use)); rc = veth_signaldata(cnx, VethEventTypeFrames, msg->token, &msg->data); if (rc != HvLpEvent_Rc_Good) @@ -981,10 +982,8 @@ static int veth_transmit_to_one(struct s return 0; recycle_and_drop: + /* we free the skb below, so tell veth_recycle_msg() not to. */ msg->skb = NULL; - /* need to set in use to make veth_recycle_msg in case this - * was a mapping failure */ - set_bit(0, &msg->in_use); veth_recycle_msg(cnx, msg); drop: port->stats.tx_errors++; @@ -1066,12 +1065,14 @@ static int veth_start_xmit(struct sk_buf return 0; } +/* You must hold the connection's lock when you call this function. */ static void veth_recycle_msg(struct veth_lpar_connection *cnx, struct veth_msg *msg) { u32 dma_address, dma_length; - if (test_and_clear_bit(0, &msg->in_use)) { + if (msg->in_use) { + msg->in_use = 0; dma_address = msg->data.addr[0]; dma_length = msg->data.len[0]; From michael at ellerman.id.au Thu Sep 1 11:29:07 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:29:07 +1000 (EST) Subject: [PATCH 7/18] iseries_veth: Only call dma_unmap_single() if dma_map_single() succeeded In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012907.A531B681F5@ozlabs.org> The iseries_veth driver unconditionally calls dma_unmap_single() even when the corresponding dma_map_single() may have failed. Rework the code a bit to keep the return value from dma_unmap_single() around, and then check if it's a dma_mapping_error() before we do the dma_unmap_single(). Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 17 ++++++++--------- 1 files changed, 8 insertions(+), 9 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -931,7 +931,6 @@ static int veth_transmit_to_one(struct s struct veth_lpar_connection *cnx = veth_cnx[rlp]; struct veth_port *port = (struct veth_port *) dev->priv; HvLpEvent_Rc rc; - u32 dma_address, dma_length; struct veth_msg *msg = NULL; int err = 0; unsigned long flags; @@ -959,20 +958,19 @@ static int veth_transmit_to_one(struct s msg->in_use = 1; - dma_length = skb->len; - dma_address = dma_map_single(port->dev, skb->data, - dma_length, DMA_TO_DEVICE); + msg->data.addr[0] = dma_map_single(port->dev, skb->data, + skb->len, DMA_TO_DEVICE); - if (dma_mapping_error(dma_address)) + if (dma_mapping_error(msg->data.addr[0])) goto recycle_and_drop; /* Is it really necessary to check the length and address * fields of the first entry here? */ msg->skb = skb; msg->dev = port->dev; - msg->data.addr[0] = dma_address; - msg->data.len[0] = dma_length; + msg->data.len[0] = skb->len; msg->data.eofmask = 1 << VETH_EOF_SHIFT; + rc = veth_signaldata(cnx, VethEventTypeFrames, msg->token, &msg->data); if (rc != HvLpEvent_Rc_Good) @@ -1076,8 +1074,9 @@ static void veth_recycle_msg(struct veth dma_address = msg->data.addr[0]; dma_length = msg->data.len[0]; - dma_unmap_single(msg->dev, dma_address, dma_length, - DMA_TO_DEVICE); + if (!dma_mapping_error(dma_address)) + dma_unmap_single(msg->dev, dma_address, dma_length, + DMA_TO_DEVICE); if (msg->skb) { dev_kfree_skb_any(msg->skb); From michael at ellerman.id.au Thu Sep 1 11:29:08 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:29:08 +1000 (EST) Subject: [PATCH 8/18] iseries_veth: Make init_connection() & destroy_connection() symmetrical In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012908.E24DF68212@ozlabs.org> This patch makes veth_init_connection() and veth_destroy_connection() symmetrical in that they allocate/deallocate the same data. Currently if there's an error while initialising connections (ie. ENOMEM) we call veth_module_cleanup(), however this will oops because we call driver_unregister() before we've called driver_register(). I've never seen this actually happen though. So instead we explicitly call veth_destroy_connection() for each connection, any that have been set up will be deallocated. We also fix a potential leak if vio_register_driver() fails. Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 35 ++++++++++++++++++++++------------- 1 files changed, 22 insertions(+), 13 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -683,6 +683,14 @@ static void veth_stop_connection(u8 rlp) /* Wait for the state machine to run. */ flush_scheduled_work(); +} + +static void veth_destroy_connection(u8 rlp) +{ + struct veth_lpar_connection *cnx = veth_cnx[rlp]; + + if (! cnx) + return; if (cnx->num_events > 0) mf_deallocate_lp_events(cnx->remote_lp, @@ -694,14 +702,6 @@ static void veth_stop_connection(u8 rlp) HvLpEvent_Type_VirtualLan, cnx->num_ack_events, NULL, NULL); -} - -static void veth_destroy_connection(u8 rlp) -{ - struct veth_lpar_connection *cnx = veth_cnx[rlp]; - - if (! cnx) - return; kfree(cnx->msgs); kfree(cnx); @@ -1441,15 +1441,24 @@ int __init veth_module_init(void) for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) { rc = veth_init_connection(i); - if (rc != 0) { - veth_module_cleanup(); - return rc; - } + if (rc != 0) + goto error; } HvLpEvent_registerHandler(HvLpEvent_Type_VirtualLan, &veth_handle_event); - return vio_register_driver(&veth_driver); + rc = vio_register_driver(&veth_driver); + if (rc != 0) + goto error; + + return 0; + +error: + for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) { + veth_destroy_connection(i); + } + + return rc; } module_init(veth_module_init); From michael at ellerman.id.au Thu Sep 1 11:29:09 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:29:09 +1000 (EST) Subject: [PATCH 9/18] iseries_veth: Use kobjects to track lifecycle of connection structs In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012909.C98036820F@ozlabs.org> The iseries_veth driver can attach to multiple vlans, which correspond to multiple net devices. However there is only 1 connection between each LPAR, so the connection structure may be shared by multiple net devices. This makes module removal messy, because we can't deallocate the connections until we know there are no net devices still using them. The solution is to use ref counts on the connections, so we can delete them (actually stop) as soon as the ref count hits zero. This patch fixes (part of) a bug we were seeing with IPv6 sending probes to a dead LPAR, which would then hang us forever due to leftover skbs. Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 121 ++++++++++++++++++++++++++++++--------------- 1 files changed, 83 insertions(+), 38 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -129,6 +129,7 @@ struct veth_lpar_connection { int num_events; struct VethCapData local_caps; + struct kobject kobject; struct timer_list ack_timer; spinlock_t lock; @@ -171,6 +172,11 @@ static void veth_recycle_msg(struct veth static void veth_flush_pending(struct veth_lpar_connection *cnx); static void veth_receive(struct veth_lpar_connection *, struct VethLpEvent *); static void veth_timed_ack(unsigned long connectionPtr); +static void veth_release_connection(struct kobject *kobject); + +static struct kobj_type veth_lpar_connection_ktype = { + .release = veth_release_connection +}; /* * Utility functions @@ -611,7 +617,7 @@ static int veth_init_connection(u8 rlp) { struct veth_lpar_connection *cnx; struct veth_msg *msgs; - int i; + int i, rc; if ( (rlp == this_lp) || ! HvLpConfig_doLpsCommunicateOnVirtualLan(this_lp, rlp) ) @@ -632,6 +638,14 @@ static int veth_init_connection(u8 rlp) veth_cnx[rlp] = cnx; + /* This gets us 1 reference, which is held on behalf of the driver + * infrastructure. It's released at module unload. */ + kobject_init(&cnx->kobject); + cnx->kobject.ktype = &veth_lpar_connection_ktype; + rc = kobject_set_name(&cnx->kobject, "cnx%.2d", rlp); + if (rc != 0) + return rc; + msgs = kmalloc(VETH_NUMBUFFERS * sizeof(struct veth_msg), GFP_KERNEL); if (! msgs) { veth_error("Can't allocate buffers for LPAR %d.\n", rlp); @@ -660,11 +674,9 @@ static int veth_init_connection(u8 rlp) return 0; } -static void veth_stop_connection(u8 rlp) +static void veth_stop_connection(struct veth_lpar_connection *cnx) { - struct veth_lpar_connection *cnx = veth_cnx[rlp]; - - if (! cnx) + if (!cnx) return; spin_lock_irq(&cnx->lock); @@ -685,11 +697,9 @@ static void veth_stop_connection(u8 rlp) flush_scheduled_work(); } -static void veth_destroy_connection(u8 rlp) +static void veth_destroy_connection(struct veth_lpar_connection *cnx) { - struct veth_lpar_connection *cnx = veth_cnx[rlp]; - - if (! cnx) + if (!cnx) return; if (cnx->num_events > 0) @@ -704,8 +714,16 @@ static void veth_destroy_connection(u8 r NULL, NULL); kfree(cnx->msgs); + veth_cnx[cnx->remote_lp] = NULL; kfree(cnx); - veth_cnx[rlp] = NULL; +} + +static void veth_release_connection(struct kobject *kobj) +{ + struct veth_lpar_connection *cnx; + cnx = container_of(kobj, struct veth_lpar_connection, kobject); + veth_stop_connection(cnx); + veth_destroy_connection(cnx); } /* @@ -1349,15 +1367,31 @@ static void veth_timed_ack(unsigned long static int veth_remove(struct vio_dev *vdev) { - int i = vdev->unit_address; + struct veth_lpar_connection *cnx; struct net_device *dev; + struct veth_port *port; + int i; - dev = veth_dev[i]; - if (dev != NULL) { - veth_dev[i] = NULL; - unregister_netdev(dev); - free_netdev(dev); + dev = veth_dev[vdev->unit_address]; + + if (! dev) + return 0; + + port = netdev_priv(dev); + + for (i = 0; i < HVMAXARCHITECTEDLPS; i++) { + cnx = veth_cnx[i]; + + if (cnx && (port->lpar_map & (1 << i))) { + /* Drop our reference to connections on our VLAN */ + kobject_put(&cnx->kobject); + } } + + veth_dev[vdev->unit_address] = NULL; + unregister_netdev(dev); + free_netdev(dev); + return 0; } @@ -1365,6 +1399,7 @@ static int veth_probe(struct vio_dev *vd { int i = vdev->unit_address; struct net_device *dev; + struct veth_port *port; dev = veth_probe_one(i, &vdev->dev); if (dev == NULL) { @@ -1373,11 +1408,23 @@ static int veth_probe(struct vio_dev *vd } veth_dev[i] = dev; - /* Start the state machine on each connection, to commence - * link negotiation */ - for (i = 0; i < HVMAXARCHITECTEDLPS; i++) - if (veth_cnx[i]) - veth_kick_statemachine(veth_cnx[i]); + port = (struct veth_port*)netdev_priv(dev); + + /* Start the state machine on each connection on this vlan. If we're + * the first dev to do so this will commence link negotiation */ + for (i = 0; i < HVMAXARCHITECTEDLPS; i++) { + struct veth_lpar_connection *cnx; + + if (! (port->lpar_map & (1 << i))) + continue; + + cnx = veth_cnx[i]; + if (!cnx) + continue; + + kobject_get(&cnx->kobject); + veth_kick_statemachine(cnx); + } return 0; } @@ -1406,29 +1453,27 @@ static struct vio_driver veth_driver = { void __exit veth_module_cleanup(void) { int i; + struct veth_lpar_connection *cnx; - /* Stop the queues first to stop any new packets being sent. */ - for (i = 0; i < HVMAXARCHITECTEDVIRTUALLANS; i++) - if (veth_dev[i]) - netif_stop_queue(veth_dev[i]); - - /* Stop the connections before we unregister the driver. This - * ensures there's no skbs lying around holding the device open. */ - for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) - veth_stop_connection(i); - + /* Disconnect our "irq" to stop events coming from the Hypervisor. */ HvLpEvent_unregisterHandler(HvLpEvent_Type_VirtualLan); - /* Hypervisor callbacks may have scheduled more work while we - * were stoping connections. Now that we've disconnected from - * the hypervisor make sure everything's finished. */ + /* Make sure any work queued from Hypervisor callbacks is finished. */ flush_scheduled_work(); - vio_unregister_driver(&veth_driver); + for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) { + cnx = veth_cnx[i]; + + if (!cnx) + continue; - for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) - veth_destroy_connection(i); + /* Drop the driver's reference to the connection */ + kobject_put(&cnx->kobject); + } + /* Unregister the driver, which will close all the netdevs and stop + * the connections when they're no longer referenced. */ + vio_unregister_driver(&veth_driver); } module_exit(veth_module_cleanup); @@ -1456,7 +1501,7 @@ int __init veth_module_init(void) error: for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) { - veth_destroy_connection(i); + veth_destroy_connection(veth_cnx[i]); } return rc; From michael at ellerman.id.au Thu Sep 1 11:29:12 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:29:12 +1000 (EST) Subject: [PATCH 10/18] iseries_veth: Remove TX timeout code In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012912.2A0B768212@ozlabs.org> The iseries_veth driver uses the generic TX timeout watchdog, however a better solution is in the works, so remove this code. Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 48 --------------------------------------------- 1 files changed, 48 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -830,49 +830,6 @@ static struct ethtool_ops ops = { .get_link = veth_get_link, }; -static void veth_tx_timeout(struct net_device *dev) -{ - struct veth_port *port = (struct veth_port *)dev->priv; - struct net_device_stats *stats = &port->stats; - unsigned long flags; - int i; - - stats->tx_errors++; - - spin_lock_irqsave(&port->pending_gate, flags); - - if (!port->pending_lpmask) { - spin_unlock_irqrestore(&port->pending_gate, flags); - return; - } - - printk(KERN_WARNING "%s: Tx timeout! Resetting lp connections: %08x\n", - dev->name, port->pending_lpmask); - - for (i = 0; i < HVMAXARCHITECTEDLPS; i++) { - struct veth_lpar_connection *cnx = veth_cnx[i]; - - if (! (port->pending_lpmask & (1<lock); - cnx->state |= VETH_STATE_RESET; - veth_kick_statemachine(cnx); - spin_unlock(&cnx->lock); - } - - spin_unlock_irqrestore(&port->pending_gate, flags); -} - static struct net_device * __init veth_probe_one(int vlan, struct device *vdev) { struct net_device *dev; @@ -921,9 +878,6 @@ static struct net_device * __init veth_p dev->set_multicast_list = veth_set_multicast_list; SET_ETHTOOL_OPS(dev, &ops); - dev->watchdog_timeo = 2 * (VETH_ACKTIMEOUT * HZ / 1000000); - dev->tx_timeout = veth_tx_timeout; - SET_NETDEV_DEV(dev, vdev); rc = register_netdev(dev); @@ -1058,8 +1012,6 @@ static int veth_start_xmit(struct sk_buf lpmask = veth_transmit_to_many(skb, lpmask, dev); - dev->trans_start = jiffies; - if (! lpmask) { dev_kfree_skb(skb); } else { From michael at ellerman.id.au Thu Sep 1 11:29:17 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:29:17 +1000 (EST) Subject: [PATCH 11/18] iseries_veth: Add a per-connection ack timer In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012917.18BA6681F5@ozlabs.org> Currently the iseries_veth driver contravenes the specification in Documentation/networking/driver.txt, in that if packets are not acked by the other LPAR they will sit around forever. This patch adds a per-connection timer which fires if we've had no acks for five seconds. This is superior to the generic TX timer because it catches the case of a small number of packets being sent and never acked. This fixes a bug we were seeing on real systems, where some IPv6 neighbour discovery packets would not be acked and then prevent the module from being removed, due to skbs lying around. Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 75 +++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 69 insertions(+), 6 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -132,6 +132,11 @@ struct veth_lpar_connection { struct kobject kobject; struct timer_list ack_timer; + struct timer_list reset_timer; + unsigned int reset_timeout; + unsigned long last_contact; + int outstanding_tx; + spinlock_t lock; unsigned long state; HvLpInstanceId src_inst; @@ -171,8 +176,9 @@ static int veth_start_xmit(struct sk_buf static void veth_recycle_msg(struct veth_lpar_connection *, struct veth_msg *); static void veth_flush_pending(struct veth_lpar_connection *cnx); static void veth_receive(struct veth_lpar_connection *, struct VethLpEvent *); -static void veth_timed_ack(unsigned long connectionPtr); static void veth_release_connection(struct kobject *kobject); +static void veth_timed_ack(unsigned long ptr); +static void veth_timed_reset(unsigned long ptr); static struct kobj_type veth_lpar_connection_ktype = { .release = veth_release_connection @@ -360,7 +366,7 @@ static void veth_handle_int(struct VethL HvLpIndex rlp = event->base_event.xSourceLp; struct veth_lpar_connection *cnx = veth_cnx[rlp]; unsigned long flags; - int i; + int i, acked = 0; BUG_ON(! cnx); @@ -374,13 +380,22 @@ static void veth_handle_int(struct VethL break; case VethEventTypeFramesAck: spin_lock_irqsave(&cnx->lock, flags); + for (i = 0; i < VETH_MAX_ACKS_PER_MSG; ++i) { u16 msgnum = event->u.frames_ack_data.token[i]; - if (msgnum < VETH_NUMBUFFERS) + if (msgnum < VETH_NUMBUFFERS) { veth_recycle_msg(cnx, cnx->msgs + msgnum); + cnx->outstanding_tx--; + acked++; + } } + + if (acked > 0) + cnx->last_contact = jiffies; + spin_unlock_irqrestore(&cnx->lock, flags); + veth_flush_pending(cnx); break; case VethEventTypeFrames: @@ -454,8 +469,6 @@ static void veth_statemachine(void *p) restart: if (cnx->state & VETH_STATE_RESET) { - int i; - if (cnx->state & VETH_STATE_OPEN) HvCallEvent_closeLpEventPath(cnx->remote_lp, HvLpEvent_Type_VirtualLan); @@ -474,15 +487,20 @@ static void veth_statemachine(void *p) | VETH_STATE_SENTCAPACK | VETH_STATE_READY); /* Clean up any leftover messages */ - if (cnx->msgs) + if (cnx->msgs) { + int i; for (i = 0; i < VETH_NUMBUFFERS; ++i) veth_recycle_msg(cnx, cnx->msgs + i); + } + cnx->outstanding_tx = 0; /* Drop the lock so we can do stuff that might sleep or * take other locks. */ spin_unlock_irq(&cnx->lock); del_timer_sync(&cnx->ack_timer); + del_timer_sync(&cnx->reset_timer); + veth_flush_pending(cnx); spin_lock_irq(&cnx->lock); @@ -631,9 +649,16 @@ static int veth_init_connection(u8 rlp) cnx->remote_lp = rlp; spin_lock_init(&cnx->lock); INIT_WORK(&cnx->statemachine_wq, veth_statemachine, cnx); + init_timer(&cnx->ack_timer); cnx->ack_timer.function = veth_timed_ack; cnx->ack_timer.data = (unsigned long) cnx; + + init_timer(&cnx->reset_timer); + cnx->reset_timer.function = veth_timed_reset; + cnx->reset_timer.data = (unsigned long) cnx; + cnx->reset_timeout = 5 * HZ * (VETH_ACKTIMEOUT / 1000000); + memset(&cnx->pending_acks, 0xff, sizeof (cnx->pending_acks)); veth_cnx[rlp] = cnx; @@ -948,6 +973,13 @@ static int veth_transmit_to_one(struct s if (rc != HvLpEvent_Rc_Good) goto recycle_and_drop; + /* If the timer's not already running, start it now. */ + if (0 == cnx->outstanding_tx) + mod_timer(&cnx->reset_timer, jiffies + cnx->reset_timeout); + + cnx->last_contact = jiffies; + cnx->outstanding_tx++; + spin_unlock_irqrestore(&cnx->lock, flags); return 0; @@ -1093,6 +1125,37 @@ static void veth_flush_pending(struct ve } } +static void veth_timed_reset(unsigned long ptr) +{ + struct veth_lpar_connection *cnx = (struct veth_lpar_connection *)ptr; + unsigned long trigger_time, flags; + + /* FIXME is it possible this fires after veth_stop_connection()? + * That would reschedule the statemachine for 5 seconds and probably + * execute it after the module's been unloaded. Hmm. */ + + spin_lock_irqsave(&cnx->lock, flags); + + if (cnx->outstanding_tx > 0) { + trigger_time = cnx->last_contact + cnx->reset_timeout; + + if (trigger_time < jiffies) { + cnx->state |= VETH_STATE_RESET; + veth_kick_statemachine(cnx); + veth_error("%d packets not acked by LPAR %d within %d " + "seconds, resetting.\n", + cnx->outstanding_tx, cnx->remote_lp, + cnx->reset_timeout / HZ); + } else { + /* Reschedule the timer */ + trigger_time = jiffies + cnx->reset_timeout; + mod_timer(&cnx->reset_timer, trigger_time); + } + } + + spin_unlock_irqrestore(&cnx->lock, flags); +} + /* * Rx path */ From michael at ellerman.id.au Thu Sep 1 11:29:18 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 1 Sep 2005 11:29:18 +1000 (EST) Subject: [PATCH 12/18] iseries_veth: Simplify full-queue handling In-Reply-To: <1125538127.859382.875909607846.qpush@concordia> Message-ID: <20050901012918.2C8EA68224@ozlabs.org> The iseries_veth driver often has multiple netdevices sending packets over a single connection to another LPAR. If the bandwidth to the other LPAR is exceeded, all the netdevices must have their queues stopped. The current code achieves this by queueing one incoming skb on the per-netdevice port structure. When the connection is able to send more packets we iterate through the port structs and flush any packet that is queued, as well as restarting the associated netdevice's queue. This arrangement makes less sense now that we have per-connection TX timers, rather than the per-netdevice generic TX timer. The new code simply detects when one of the connections is full, and stops the queue of all associated netdevices. Then when a packet is acked on that connection (ie. there is space again) all the queues are woken up. Signed-off-by: Michael Ellerman --- drivers/net/iseries_veth.c | 108 ++++++++++++++++++++++++++------------------- 1 files changed, 64 insertions(+), 44 deletions(-) Index: veth-dev2/drivers/net/iseries_veth.c =================================================================== --- veth-dev2.orig/drivers/net/iseries_veth.c +++ veth-dev2/drivers/net/iseries_veth.c @@ -158,10 +158,11 @@ struct veth_port { u64 mac_addr; HvLpIndexMap lpar_map; - spinlock_t pending_gate; - struct sk_buff *pending_skb; - HvLpIndexMap pending_lpmask; + /* queue_lock protects the stopped_map and dev's queue. */ + spinlock_t queue_lock; + HvLpIndexMap stopped_map; + /* mcast_gate protects promiscuous, num_mcast & mcast_addr. */ rwlock_t mcast_gate; int promiscuous; int num_mcast; @@ -174,7 +175,8 @@ static struct net_device *veth_dev[HVMAX static int veth_start_xmit(struct sk_buff *skb, struct net_device *dev); static void veth_recycle_msg(struct veth_lpar_connection *, struct veth_msg *); -static void veth_flush_pending(struct veth_lpar_connection *cnx); +static void veth_wake_queues(struct veth_lpar_connection *cnx); +static void veth_stop_queues(struct veth_lpar_connection *cnx); static void veth_receive(struct veth_lpar_connection *, struct VethLpEvent *); static void veth_release_connection(struct kobject *kobject); static void veth_timed_ack(unsigned long ptr); @@ -221,6 +223,12 @@ static inline struct veth_msg *veth_stac return msg; } +/* You must hold the connection's lock when you call this function. */ +static inline int veth_stack_is_empty(struct veth_lpar_connection *cnx) +{ + return cnx->msg_stack_head == NULL; +} + static inline HvLpEvent_Rc veth_signalevent(struct veth_lpar_connection *cnx, u16 subtype, HvLpEvent_AckInd ackind, HvLpEvent_AckType acktype, @@ -391,12 +399,12 @@ static void veth_handle_int(struct VethL } } - if (acked > 0) + if (acked > 0) { cnx->last_contact = jiffies; + veth_wake_queues(cnx); + } spin_unlock_irqrestore(&cnx->lock, flags); - - veth_flush_pending(cnx); break; case VethEventTypeFrames: veth_receive(cnx, event); @@ -492,7 +500,9 @@ static void veth_statemachine(void *p) for (i = 0; i < VETH_NUMBUFFERS; ++i) veth_recycle_msg(cnx, cnx->msgs + i); } + cnx->outstanding_tx = 0; + veth_wake_queues(cnx); /* Drop the lock so we can do stuff that might sleep or * take other locks. */ @@ -501,8 +511,6 @@ static void veth_statemachine(void *p) del_timer_sync(&cnx->ack_timer); del_timer_sync(&cnx->reset_timer); - veth_flush_pending(cnx); - spin_lock_irq(&cnx->lock); if (cnx->state & VETH_STATE_RESET) @@ -869,8 +877,9 @@ static struct net_device * __init veth_p port = (struct veth_port *) dev->priv; - spin_lock_init(&port->pending_gate); + spin_lock_init(&port->queue_lock); rwlock_init(&port->mcast_gate); + port->stopped_map = 0; for (i = 0; i < HVMAXARCHITECTEDLPS; i++) { HvLpVirtualLanIndexMap map; @@ -980,6 +989,9 @@ static int veth_transmit_to_one(struct s cnx->last_contact = jiffies; cnx->outstanding_tx++; + if (veth_stack_is_empty(cnx)) + veth_stop_queues(cnx); + spin_unlock_irqrestore(&cnx->lock, flags); return 0; @@ -1023,7 +1035,6 @@ static int veth_start_xmit(struct sk_buf { unsigned char *frame = skb->data; struct veth_port *port = (struct veth_port *) dev->priv; - unsigned long flags; HvLpIndexMap lpmask; if (! (frame[0] & 0x01)) { @@ -1040,27 +1051,9 @@ static int veth_start_xmit(struct sk_buf lpmask = port->lpar_map; } - spin_lock_irqsave(&port->pending_gate, flags); - - lpmask = veth_transmit_to_many(skb, lpmask, dev); + veth_transmit_to_many(skb, lpmask, dev); - if (! lpmask) { - dev_kfree_skb(skb); - } else { - if (port->pending_skb) { - veth_error("%s: TX while skb was pending!\n", - dev->name); - dev_kfree_skb(skb); - spin_unlock_irqrestore(&port->pending_gate, flags); - return 1; - } - - port->pending_skb = skb; - port->pending_lpmask = lpmask; - netif_stop_queue(dev); - } - - spin_unlock_irqrestore(&port->pending_gate, flags); + dev_kfree_skb(skb); return 0; } @@ -1093,9 +1086,10 @@ static void veth_recycle_msg(struct veth } } -static void veth_flush_pending(struct veth_lpar_connection *cnx) +static void veth_wake_queues(struct veth_lpar_connection *cnx) { int i; + for (i = 0; i < HVMAXARCHITECTEDVIRTUALLANS; i++) { struct net_device *dev = veth_dev[i]; struct veth_port *port; @@ -1109,19 +1103,45 @@ static void veth_flush_pending(struct ve if (! (port->lpar_map & (1<remote_lp))) continue; - spin_lock_irqsave(&port->pending_gate, flags); - if (port->pending_skb) { - port->pending_lpmask = - veth_transmit_to_many(port->pending_skb, - port->pending_lpmask, - dev); - if (! port->pending_lpmask) { - dev_kfree_skb_any(port->pending_skb); - port->pending_skb = NULL; - netif_wake_queue(dev); - } + spin_lock_irqsave(&port->queue_lock, flags); + + port->stopped_map &= ~(1 << cnx->remote_lp); + + if (0 == port->stopped_map && netif_queue_stopped(dev)) { + veth_debug("cnx %d: woke queue for %s.\n", + cnx->remote_lp, dev->name); + netif_wake_queue(dev); } - spin_unlock_irqrestore(&port->pending_gate, flags); + spin_unlock_irqrestore(&port->queue_lock, flags); + } +} + +static void veth_stop_queues(struct veth_lpar_connection *cnx) +{ + int i; + + for (i = 0; i < HVMAXARCHITECTEDVIRTUALLANS; i++) { + struct net_device *dev = veth_dev[i]; + struct veth_port *port; + + if (! dev) + continue; + + port = (struct veth_port *)dev->priv; + + /* If this cnx is not on the vlan for this port, continue */ + if (! (port->lpar_map & (1 << cnx->remote_lp))) + continue; + + spin_lock(&port->queue_lock); + + netif_stop_queue(dev);