diary
2.6.26 on a Lenovo x61 thinkpad
It looks like the iwl driver is slightly broken in the 2.6.26 release - connections will drop-out after 10 seconds or so.
The workaround for this is to enable the config option
CONFIG_IWL4965_HT.
asynchronous spu contexts, initial designs
I've recently been working on some changes to the spufs code, and thought I'd write-up some of the details.
At present, the spu_run syscall (used to run a SPU context)
blocks until the SPU program has exited (or some other event has happened,
such as a non-serviceable fault). This means that to take advantage of the
SPUs, you really need to start a new thread for each SPU context that you
create, otherwise your application will be sitting around waiting for each SPU
context to complete.
In fact, we have an invariant in the spufs code at the moment that only
contexts that are currently being spu_run will ever be runnable
(and, at the moment, schedulable).
Ben H and I have been chatting about some ideas about asynchronous spu
contexts. This means that the userspace app can start the context, then later
retrieve the status of the SPU context (to see if it has stopped, faulted, or
whatever). We can then use standard POSIX semantics like poll() to
see if a context is still running or has generated any "events", then handle
these events when they become available.
In effect, this is similar to spu_run: currently, the
spu_run syscall runs the SPU, then blocks until an event happens,
which is then returned to userpsace as the return value of
spu_run. The main difference is that we don't block in the kernel
while the SPU is running.
So, I've been coding up an experimental change to spufs. Firstly, we have
to explicitly tell the kernel that we want a context to operate in asynchronous
mode, so I've added a new flag to the spu_create syscall:
SPU_CREATE_ASYNC.
I've opted for a file-based interface to these asynchronous contexts -
SPU events are retrieved by reading from a file. Contexts that are created with
the SPU_CREATE_ASYNC flag have an extra file present (called
something like "event") in their context directory in the
spufs mount. Reading from this file allows applications to retreive events
that the SPU program has raised.
We need to define a format for the data read from this events file, so here's something to get started with:
struct spu_event { uint32_t event; uint32_t status; uint32_t npc; };
- where the event member specifies which event happened - a
stop-and-signal for example.
The status and npc members return the status of the SPU and the next program counter register, respectively. While not strictly necessary (this information is available from other files in spufs), it's very likely that the application will need these values in order to handle the event.
So, users of this interface may look something like this:
uint32_t npc = 0; struct context { int fd; int event_fd; } context; /* create the context */ context.fd = spu_create("/spu/ctx", NULL, SPU_CREATE_ASYNC); /* open the events file */ context.event_fd = openat(context.fd, "event", O_RDWR); /* start the context running. unlike the spu_run syscall, * this function does not block for the duration of the * spu program */ run_context(&context, npc); for (;;) { struct spu_event event; /* get the next event caused by the SPU */ read(context.event_fd, &event, sizeof(event)); if (event.event == SPU_EVENT_STOP) break; /* handle other event ... */ }
Note that the userspace examples here are not what we'd present to Cell application developers. They're more low-level examples of how the new asynchronous kernel interface works. In fact, the changes could be completely transparent to applications which use the libSPE interface.
This isn't far from the API provided by the current spu_run
syscall, except that we're not waiting in the kernel while the SPU is
running.
Also, we're going to need to control the SPU somehow - for example, we need
to implement the run_context function in the pseudocode above.
Rather than overloading the spu_run syscall, I've opted to use the
same event file - writes to this file will allow userspace to control the SPU.
I'm still working out the exact format of these writes, but the way I've
implemented it at the moment is that the application can write structures of
this layout to the file:
struct spu_control { uint32_t op; char data[]; };
The contents of the data member depends on the operation
requested (specified by the op member). For example, a 'start spu'
operation would have four extra bytes - a uint32_t containing the
NPC to start the SPU execution from. A 'stop spu' operation doesn't require any
extra parameters, so the data member would be 0 bytes long.
This would allow us to implement the run_context function
as follows:
void run_context(struct context *context, uint32_t npc) { uint32_t buf[2]; buf[0] = SPU_CONTROL_START_SPU; buf[1] = npc; write(context.event_fd, buf, sizeof(buf)); }
There are plenty of other issues to deal with (like signals, and debugging), but I have a basic prototype working at the moment. More to come!
debian on a qs22 cell blade
Seeing as the QS22 blades are out, here's a short guide to getting debian installed.
[jk@qs22 ~]$ grep -m1 ^cpu /proc/cpuinfo cpu : Cell Broadband Engine, altivec supported [jk@qs22 ~]$ lsb_release -d Description: Debian GNU/Linux unstable (sid)
kernel
You'll need a kernel that has support for the IBM Cell blades. If you configure your kernel with the 'cell_defconfig' target, you should have all the necessary options:
[jk@pingu linux-2.6.25]$ make cell_defconfig
Specifically, you need:
CONFIG_PPC_IBM_CELL_BLADE;CONFIG_SERIAL_OF_PLATFORM;CONFIG_FUSION_SAS;CONFIG_ROOT_NFS;CONFIG_IP_PNP_DHCPandCONFIG_SPU_FS.
root filesystem
The QS22s have no internal disk (they're compute nodes, right?), so you'll have to either:
- use a remote root filesystem, like NFS; or
- add a LSI SAS adaptor to the blade, and use an external SAS disk for the root filesystem.
The installation process will be different depending on which you choose, so just skip to the appropriate section here.
NFS root
For the first option, there's a number of NFS-root howtos
around. First up, we need to build the actual debian filesystem, using
debootstrap. For example:
[jk@pingu ~]$ sudo debootstrap --arch=powerpc --foreign sid /srv/nfs/qs22/
This will create an entire debian filesystem in /srv/nfs/qs22.
We need to make a few modifications though:
- add the following line to
/etc/inittab:T0:23:respawn:/sbin/getty -L ttyS0 19200 vt100
- and couple of extra device nodes:
[jk@pingu ~]$ cd /srv/nfs/qs22/dev [jk@pingu dev]$ sudo mknod console c 5 1 [jk@pingu dev]$ sudo mknod ttyS0 c 4 64
Once this is done, we need to complete the bootstrap on the QS22. Set up
your NFS server, and export the appropriate directory. Boot the QS22 with the
nfs root kernel options, plus "rw init=/bin/sh" (eg
root=/dev/nfs nfsroot=server_ip:/srv/nfs/qs22 ip=dhcp rw
init=/bin/sh). Then, once the machine has booted:
sh-3.2# PATH=/:/bin:/usr/bin:/sbin:/usr/sbin /debootstrap/debootstrap --second-stage
This should finish the bootstrap. After it has completed (it should finish
with "I: Base system installed successfully"), reboot the
machine with the same kernel command line, minus the rw
init=/bin/sh arguments. Once it boots, you should have the debian login
prompt. Login as root (there will be no password, but don't forget to set one)
and away you go.
SAS disk
If you're using SAS, the install is much more straightforward, as you can just use the standard debian installer. However, you may need to use a custom kernel which supports the QS22s. This is a matter of building your own kernel, using the powerpc64 debian installer image as the initramfs:
[jk@pingu linux-2.6.25]$ wget http://ftp.us.debian.org/debian/dists/testing/main/installer-powerpc/current/images/powerpc64/netboot/initrd.gz [jk@pingu linux-2.6.25]$ gunzip -c < initrd.gz > initrd [jk@pingu linux-2.6.25]$ make cell_defconfig [jk@pingu linux-2.6.25]$ sed -ie 's,^CONFIG_INITRAMFS_SOURCE=".*",CONFIG_INITRAMFS_SOURCE="'$PWD'/initrd",' .config [jk@pingu linux-2.6.25]$ make
Then, just boot the kernel in arch/powerpc/boot/zImage.pseries. The debian installer should start, and guide you through the rest of the
installation. Since you're netbooting, you can ignore any messages about not
having a bootstrap partition, or not being able to install a kernel or
yaboot
software
Entirely optional, but you'll probably get the most out of your QS22 with a few extra packages:
[jk@qs22 ~]$ sudo apt-get install openssh-server libspe2-dev spu-gcc build-essential
linux.conf.au hackfest: the solution, part three
In part two of this series, we had just ported a fractal renderer to the SPEs on a Cell Broadband Engine machine. Now we're going to start the optimisation...
Our baseline performance is 40.7 seconds to generate the sample fractal (using the sample fractal parameters).
The initial SPE-based fractal renderer used only one SPE. However, we have more available:
| Machine | SPEs available |
|---|---|
| Sony Playstation 3 | 6 |
| IBM QS21 / QS22 blades. | 16 (8 per CPU) |
So, we should be able to get better performance by distributing the render work between the SPEs. We can do this by dividing the fractal into a set of n strips, where n is the number of SPEs available. Then, each SPE renders its own strip of the fractal.
The following image shows the standard fractal, as would be rendered by 8 SPEs, with shading to show the division of the work amongst the SPEs.
In order to split up the work, we first need to tell each SPE which part of
the fractal it is to render. We'll add two variables to the
spe_args structure (which is passed to the SPE during program
setup), to provide the total number of threads (n_threads) and
the index of this SPE thread (thread_idx).
struct spe_args { struct fractal_params fractal; int n_threads; int thread_idx; };
spe changes
On the SPE side, we use these parameters to alter the invocations of
render_fractal. We set up another couple of convenience variables:
rows_per_spe = args.fractal.rows / args.n_threads;
start_row = rows_per_spe * args.thread_idx;
And just alter our for-loop to only render
rows_per_spe rows, rather than the entire fractal:
for (row = 0; row < rows_per_spe; row += rows_per_dma) {
render_fractal(&args.fractal, start_row + row,
rows_per_dma);
mfc_put(buf, ppe_buf + (start_row + row) * bytes_per_row,
bytes_per_row * rows_per_dma,
0, 0, 0);
/* Wait for the DMA to complete */
mfc_write_tag_mask(1 << 0);
mfc_read_tag_status_all();
}
ppe changes
The changes to the PPE code are fairly simple - instead of just creating one thread, create n threads.
First though, let's add a '-n' argument to the program to
specify the number of threads to start:
while ((opt = getopt(argc, argv, "p:o:n:")) != -1) {
switch (opt) {
/* other options omitted */
case 'n':
n_threads = atoi(optarg);
break;
Rather than just creating one SPE thread, we create n_threads.
Also, we have to set the thread-specific arguments for each thread:
for (i = 0; i < n_threads; i++) {
/* copy the fractal data into this thread's args */
memcpy(&threads[i].args.fractal, fractal, sizeof(*fractal));
/* set thread-specific arguments */
threads[i].args.n_threads = n_threads;
threads[i].args.thread_idx = i;
threads[i].ctx = spe_context_create(0, NULL);
spe_program_load(threads[i].ctx, &spe_fractal);
pthread_create(&threads[i].pthread, NULL,
spethread_fn, &threads[i]);
}
Now, the SPEs should be running, and generating their own slice of the fractal. We just have to wait for them all to finish:
/* wait for the threads to finish */
for (i = 0; i < n_threads; i++)
pthread_join(threads[i].pthread, NULL);
If you're after the source code for the multi-threaded SPE fractal renderer, it's available in fractal.3.tar.gz.
That's it! Now we have a multi-threaded SPE application to do the fractal rendering. In theory, an n threaded program will take 1/n of the time of a single-threaded implementation, let's see how that fares with reality...
performance
Let's compare invocations of our multi-threaded fractal renderer, with
different values for the n_threads parameter.
| SPE threads | Running time (sec) |
|---|---|
| 1 | 40.72 |
| 2 | 30.14 |
| 4 | 18.84 |
| 6 | 12.72 |
| 8 | 10.89 |
Not too bad, but we're definitely not seeing linear scalability here; we could expect the 8 SPE case to take around 5 seconds, rather than 11.
what's slowing us down?
A little investigation into the fractal generator will show that some SPE
threads are finishing long before others. This is due to the variability in
calculation time between pixels. In order to see if a point (ie, pixel) in the
fractal does not converge towards infinity (and gets coloured blue),
we need to do the full i_max tests in render_fractal
(i_max is 10,000 in our sample fractal). Other pixels may
converge much more quickly (possibly in under 10 iterations), and so are orders
of mangitude faster to calculate.
We end up with slices that are very quick to calculate, and others that take longer. Of course, we have to wait for the longest-running SPE thread to complete, so our overall runtime will be that of the slowest thread.
So, let's take another aproach to distributing the workload. Rather than dividing the fractal into contiguous slices, we can interleave the SPE work. For example, if we were to use 2 SPE threads, then thread 0 would render all the even chunks, and thread 1 would render all the odd chunks (where a "chunk" is a set of rows that fit into a single DMA). This should even-out the work between SPE threads.
This is just a matter of changing the SPE for-loop a little.
Rather than the current code, which divides the work into
n_threads contiguous chunks:
for (row = 0; row < rows_per_spe; row += rows_per_dma) {
render_fractal(&args.fractal, start_row + row,
rows_per_dma);
mfc_put(buf, ppe_buf + (start_row + row) * bytes_per_row,
bytes_per_row * rows_per_dma,
0, 0, 0);
/* Wait for the DMA to complete */
mfc_write_tag_mask(1 << 0);
mfc_read_tag_status_all();
}
We change this to render every n_threadth
chunk, starting from thread_idx, which gives us the
the interleaved pattern:
for (row = rows_per_dma * args.thread_idx;
row < args.fractal.rows;
row += rows_per_dma * args.n_threads) {
render_fractal(&args.fractal, row,
rows_per_dma);
mfc_put(buf, ppe_buf + row * bytes_per_row,
bytes_per_row * rows_per_dma,
0, 0, 0);
/* Wait for the DMA to complete */
mfc_write_tag_mask(1 << 0);
mfc_read_tag_status_all();
}
An updated renderer is available in fractal.4.tar.gz.
Making this small change gives some better performance figures:
| SPE threads | Running time (sec) |
|---|---|
| 1 | 40.72 |
| 2 | 20.75 |
| 4 | 10.78 |
| 6 | 7.44 |
| 8 | 5.81 |
We're doing much better now, but we're still nowhere near the theoretical maximum performance of the SPEs. More optimisations in the next article...
qs22 released
The next revision of IBM Cell Broadband Engine machines has just been released - the QS22 blade. The QS22 has five-times the double-precision floating point performance of the previous Cell blade (the QS21), but is instruction-set compatible. This is also the first Cell/B.E. machine to use DDR2 memory, and can hold up to 32GB.
In other powerpc news, Terra Soft Solutions have announced the PowerStation - a deskside development machine, based on 2 dual-core PowerPC 970MP CPUs. It comes with Yellow Dog Linux installed, including the Cell/B.E. SDK. Hugh (amongst many others) has been doing a lot of great work getting this machine out the door (he has a number of posts on his blog if you're keen to see the "making of" feature). Nice work Hugh!
linux.conf.au hackfest: the solution, part two
In the last article we finished with a SPE-based fractal renderer, but with a limited maximum fractal size of 64 × 64 pixels:
We'd like to generate full-size fractals, but the DMAs (which we use to transfer the fractal image out of the SPE) have a maximum size of 64kB. The solution is to perform multiple DMAs each containing a subset of the image's rows.
Each invocation of render_fractal() should render a DMA-able
chunk of fractal data, then we perform the DMA. We do this until the SPE has
processed the entire image:
We just need to modify the spe-fractal code (spe-fractal.c) a
little. At present, we just render the whole fractal in one pass and DMA the
data in the main() function:
render_fractal(&args.fractal);
mfc_put(args.fractal.imgbuf, ppe_buf,
args.fractal.rows * args.fractal.cols * sizeof(struct pixel),
0, 0, 0);
/* Wait for the DMA to complete */
mfc_write_tag_mask(1 << 0);
mfc_read_tag_status_all();
First, we need to modify our render_fractal() fuction to take
a starting row, and a number of rows to render. This is the new prototype
of render_fractal():
static void render_fractal(struct fractal_params *params, int start_row, int n_rows)
In the SPE program's main(), we just need to set up some
convenience variables:
bytes_per_row = sizeof(*buf) * args.fractal.cols;
rows_per_dma = sizeof(buf) / bytes_per_row;
And do the rendering and DMAs in a loop:
for (row = 0; row < args.fractal.rows; row += rows_per_dma) {
render_fractal(&args.fractal, row, rows_per_dma);
mfc_put(buf, ppe_buf + row * bytes_per_dma,
rows_per_dma * bytes_per_row,
0, 0, 0);
/* Wait for the DMA to complete */
mfc_write_tag_mask(1 << 0);
mfc_read_tag_status_all();
}
This loop will render as many image rows as will fit into a single DMA, then DMA the rendered data back to main memory.
Now, we're able to render fractals larger than 64 × 64 pixels:
The source for the updated fractal renderer is available in fractal.2.tar.gz.
performance
Now that we can generate full-size fractals, we can compare the running times with the PPE-based fractal renderer. The following table shows running times with a standard fractal (using these fractal parameters).
| Implementation | Time (sec) |
|---|---|
| PPE | 55.7 |
| 1 SPE | 40.7 |
So, we get a 27% speedup by moving the fractal generation code to run on a SPE. We're still a way behind the optimal performance though, and benchmarking on other systems gives better times (for example, generating the same fractal on an Intel Core 2 Duo @ 2.4GHz takes 13.8 seconds).
We can improve the Cell performance by a large amount - stay tuned for the next article to see how.
linux.conf.au hackfest: the solution, part one
During linux.conf.au 2008, a bunch of us ozlabbers ran the hackfest - a programming competition for conference attendees. This year's task was to optimise a fractal generation program to run on the Cell Broadband Engine - the hackfest task description is still available if you want to take a squiz.
The next few articles here will take you through a solution to the hackfest task. This is only one approach, and there may be many others. If you have any comments or questions, feel free to mail me.
(If you're viewing this through a feed reader or planet, you may want to check out the the original article, where you get much nicer code formatting.)
optimising
The task is a matter of optimising an existing program. We should take a leaf out of Knuth's book here:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
We'll start out with something simple, and work our way up from there.
starting out
As a starting point, it'd be a good idea to check out the simple-fractal example, to find out what sort of problem we're tackling here.
While we're at it, we can do a bit of profiling on the sample fractal generator to find out where the hot paths of the program are.
A quick way to do this is to run the simple-fractal program under callgrind:
[jk@pokey simple-fractal]$ callgrind --simulate-cache=yes --dump-instr=yes \
./simple-fractal fractal.data
Looking at the callgraph output (using kcachegrind), we can get a list of the functions taking the largest amount of CPU time:
The 'Self' column gives the estimated percentage of cycles spent in each
function. We can see that 99.2% of the CPU time is spent in
render_fractal(), 0.7% in various png-encoding functions, and
0.04% calculating the colour map for the fractal.
Now that we know what we need to optimise, we can work on offloading this
to the SPEs on the Cell Processor. Because the majority of the running time is
due to render_fractal(), we should offload that work to the
SPEs.
cell version
We can get a fractal generator working on the Cell pretty quickly, by using the simple-fractal sample code for the fractal side of things, along with the data-transfer example for a framework for getting code running on the SPEs.
To me, the most logical approach is to move the render_fractal()
to the SPEs, then DMA the resulting fractal data to the PPE, which does the
PNG encoding. We should start with a simple single-SPE renderer:
This will require a few changes:
- On the SPE side:
- We need some way of getting the fractal parameters (ie, a copy
of
struct fractal_params) to the SPE, so we should embed these intostruct spe_args:struct spe_args { struct fractal_params fractal; } __attribute__((aligned(SPE_ALIGN)));
- For the moment, we'll deal with fractals that can fit into a single
SPE DMA (ie, 16kB, which we've defined to
CHUNK_SIZE). So, we'll need a local buffer on the SPE to work with:struct pixel buf[CHUNK_SIZE / sizeof(struct pixel)] __attribute__((aligned(SPE_ALIGN)));
- As in the data-transfer example, the SPE program starts by DMA-ing a
copy of
struct spe_argsinto local store, from the address (in main memory) provided in theargvargument tomain(). We'll need to do this here too:/* * The argv argument will be populated with the address that the PPE provided, * from the 4th argument to spe_context_run() */ int main(uint64_t speid, uint64_t argv, uint64_t envp) { struct spe_args args __attribute__((aligned(SPE_ALIGN))); uint64_t ppe_buf; /* DMA the spe_args struct into the SPE. The mfc_get function * takes the following arguments, in order: * * - The local buffer pointer to DMA into * - The remote address to DMA from * - A tag (0 to 15) to assign to this DMA transaction. The tag is * later used to wait for this particular DMA to complete. * - The transfer class ID (don't worry about this one) * - The replacement class ID (don't worry about this one either) */ mfc_get(&args, argv, sizeof(args), 0, 0, 0); /* Wait for the DMA to complete - we write the tag mask with * (1 << tag), where tag is 0 in this case */ mfc_write_tag_mask(1 << 0); mfc_read_tag_status_all();
- Since the
fractal_params->imgbufpointer is a reference to main memory, we need to do a bit of shuffling to put a valid local store address in there, forrender_fractalto use. We still need to keep this pointer though, as we'll need it when we DMA our fractal (now in local store) back to main memory. So, replace theimgbufpointer with our localbufarray, and keep the PPE pointer inppe_buffor later use:/* initialise our local buffer */ ppe_buf = (uint64_t)(unsigned long)args.fractal.imgbuf; args.fractal.imgbuf = buf; - We can now call
render_fractal()on the SPE:render_fractal(&args.fractal); - And finally, DMA-put our rendered fractal to main memory, at the original
ppe_bufpointer, and return:mfc_put(args.fractal.imgbuf, ppe_buf, args.fractal.rows * args.fractal.cols * sizeof(struct pixel), 0, 0, 0); /* Wait for the DMA to complete */ mfc_write_tag_mask(1 << 0); mfc_read_tag_status_all(); return 0; }
- We need some way of getting the fractal parameters (ie, a copy
of
- On the PPE side:
- Since we're creating a SPE thread, we need a function to do the
spe_context_run(), to pass topthread_create:void *spethread_fn(void *data) { struct spe_thread *spethread = data; uint32_t entry = SPE_DEFAULT_ENTRY; /* run the context, passing the address of our args structure to * the 'argv' argument to main() */ spe_context_run(spethread->ctx, &entry, 0, &spethread->args, NULL, NULL); return NULL; }
- We need to be careful with the alignment of the fractal buffer, as
the SPE needs to DMA the fractal here. So, instead of using
malloc, usememalign/* allocate our image buffer */ fractal->imgbuf = memalign(SPE_ALIGN, sizeof(*fractal->imgbuf) * fractal->rows * fractal->cols); - Copy the parsed fractal data into the
spe_argsstructure:memcpy(&thread.args.fractal, fractal, sizeof(*fractal)); - Rather than calling render_fractal on the PPE, we just create the SPE
context, upload the SPE program and start it in a new thread:
thread.ctx = spe_context_create(0, NULL); spe_program_load(thread.ctx, &spe_fractal); pthread_create(&thread.pthread, NULL, spethread_fn, &thread); - Now the SPE should be happily generating the fractal, and will DMA it
back to our allocated buffer when it's complete. We just wait for the SPE
thread to finish, and write the fractal out to a PNG file:
pthread_join(thread.pthread, NULL); /* compress and write to outfile */ if (write_png(outfile, fractal->rows, fractal->cols, fractal->imgbuf)) return EXIT_FAILURE; return EXIT_SUCCESS; }
- Since we're creating a SPE thread, we need a function to do the
If immediate gratification is more your style, here's one I prepared earlier.
After these changes (plus some general plumbing), you should have a working SPE-based fractal renderer!
However, we still have a few limitations:
- We can only generate fractals up to 16kB in total size - that's a maximum of 64 × 64 pixels;
- We've only started one SPE thread; and
- The generation is not significantly quicker on the SPE than on the PPE.
So, nothing too exciting yet. However, in the next part of this series, we'll be working on optimising our new program to use some of the neat features of the Cell architecture, and get around each of these limitations.
Stay tuned!
spufs git tree on kernel.org
After going through the magical approval process, I now have a spufs git tree published on kernel.org.
If you're looking to try out the latest work on spufs, just do a:
[jk@pokey ~]$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/jk/spufs.git
As with other git repositories on kernel.org, there's a gitweb interface to browse the tree.
Of course, if you have any bug reports/requests/comments about the code in the spufs.git tree, feel free to email me at jk@ozlabs.org, or the Cell/B.E. open source development list at cbe-oss-dev@ozlabs.org.
petitboot v0.2
The next version of petitboot - the graphical bootloader for the PlayStation 3 - is now out.
Some notable changes in the v0.2 build:
- PS3 controller support
- Improved bootloader config file parsing, should now recognise most setups "out of the box"
- OtherOS images are now based on OpenWRT, so we have a more complete linux environment
- UUID= and LABEL= device specifications are now supported
- Better montior detection with the 2.6.24 kernel
See the petitboot project page for more details and downloads. I've also built an OtherOS image with remote access support, so it's now possible to ssh to your bootloader.
linux on cell page
Up until now, I had a bunch of Linux on Cell information scattered about my website - I've now organised this into a central Linux on Cell page. Some current items:
- Linux on Cell kernel status
- The spufs testsuite
- SPE toolchain info
- OpenWRT support
- Linux on Cell links
If you're interested in Linux on Cell development, take a look.
There's also a heap of more general Cell development resources on the IBM developerWorks site.
openwrt for ps3
I've just posted a series of patches to add PlayStation 3 support to the OpenWRT project.
It's still in development (I need to add a few packages, like kexec-tools), but we now have a basic installation of the OpenWRT distribution on the PS3. This will make it possible to have a fully configurable linux installation that boots from the PS3's flash area. Once we have kexec working, you could use it as a bootloader, but it's also possible to add an ssh server, http server, or any of the standard tools available in OpenWRT.
If you try these out, let me know how you go.
Update: these patches have been accepted to the main OpenWRT subversion repository, so any revision after r9413 should work on the ps3.