[K42-discussion] RE: Possible benefit of running K42 on Cell CBE ...
Elvis John Dowson
elvis_dowson at hotmail.com
Sat Dec 24 06:51:09 EST 2005
Hi,
My comments can be found below ..
> -----Original Message-----
> From: Jimi Xenidis [mailto:jimix at watson.ibm.com]
> Sent: Friday, December 23, 2005 4:57 PM
> To: Elvis John Dowson
> Cc: 'Andrew Baumann'; 'IBM Research K42 Discussion Forum'; 'Orran Krieger'
> Subject: Re: Possible benefit of running K42 on Cell CBE ...
>
> more indents, and some trimming
>
> On Dec 23, 2005, at 12:57 AM, Elvis John Dowson wrote:
>
> > Hi Jimi,
> [snip]
> > I'm new to this, so I was hoping I could get an answer from the K42
> > forum.
> > By sharing memory between cluster nodes, I mean in terms of pointer
> > accessibility to the data in the local stores of the individual
> > SPUs and to
> > be able to access memory using a sort of global address space.
>
> There are there are 5 scopes here:
> 1 the chip, a single CBE chip (single processor CBE Blade)
>
> 2 the SMP, 2 CBE chips (dual processor CBE Blade)
>
> 3 the cluster, several CBE Blades connected by some cluster
> interconnect like Myrinet or Infiniband
> with some MPI software
>
> 4 the DSM cluster, using the cluster scenario above with some DSM
> SW to make it look like a unified address space.
>
> 5 the large SMP with some "mythical" switch, several CBE blades
> interconnected with IBM SP or SGI CrayLink
> technology.
>
> 1-2 are all currently possible of Linux and k42 (after the CPE port
> in k42)
>
> 3 Popular in Linux, k42 is working
>
> 4 Used in Linux (see OpenMosix or Mosix projects) not even tried on k42.
>
> 5 Architecturally this is the same as 2, however K42 has many ccNUMA
> features (from its origins) that can take
> advantage of memory and processor locality.
>
> Which do you want?
I'm not sure, yet at this stage. I need to understand more about the Cell
processor, its architecture and programming applications it first. Working
on the CBE UML Profile, to support auto-code generation is one way getting
to learn more about the SPUs.
I'm tempted to think that I would need to look at choices 2, 3 and 5, but
when considering porting existing applications to the new Cell platform, the
first thing to consider is to minimize the effort required to perform the
initial port and achieve an initial baseline release on the new target
system. So, for example if an existing linux cluster application uses MPI,
then the initial port might get done faster with choice 2.
>
> > I read
> > references to the keywords 'cache coherent', 'scalable', 'hot
> > swappable' ,
> > etc, when reading through some of the articles and present work
> > being done
> > on K42, and I got the impression that one of the features of the
> > K42 was
> > that its modular and supports/or will support hot swapping of nodes
> > in and
> > out of the system.
> [snip]
>
> Sure K42 allows for these things, but (and I'm not the authority on
> this) the "hot swappable" mainly refers to swapping SW components,
> rather than HW components, and focuses on SMP rather than clusters.
>
Yes, I was also referring to software support. Hardware support would depend
on the actual hardware system.
> [snip]
> > If you have to built a cluster of say 64 nodes, wouldn't you
> > require a high
> > speed hardware interconnect. Older versions of SGI supercomputers were
> > configured into a hypercube topology using what they called a CrayLink
> > interconnect fabric, which I think was a hardware component.
>
> We are very familiar with ccNUMA, since k42 was borne from such a
> system.
> However, few clusters use this type of technology. Most clusters
> these days try to take advantage of commodity parts and cheap node
> interconnects that communicate over MPI.
> see:
> http://www.bsc.org.es/index,en.html
> http://www.top500.org/
>
> Unless you have a bazzilian dollars :)
:-) !! The last company I worked for, Snecma, develops aircraft engines
(CFM56), which is a 50-50 joint venture between GE Aircraft Engines and
Snecma. They were very careful about their investments :-) !! But typically,
all that computing power is never enough and we needed more and more, but we
were constrained on return on investment factors versus industry growth and
market factors.
> >
> > Here are some excerpts from some links I got while search for "SGI
> > Hypercube
> > hardware"
> >
> > http://sc.jpl.nasa.gov/hardware/origin2000/using/
> here is something more up to date.
>
> http://www.cray.com/products/xt3/
> [snip]
> >
> > Yes, I plan to generate some SPU specific C/C++ code from the UML
> > models,
> > using UML Stereotypes and UML Stereotype Tags. These specific
> > extensions
> > will be stored in what is called a UML Profile. So, in effect, if I
> > draw a
> > class diagram called 'DisplayProcessor' and stereotype using
> > <<SPUThread>>,
> > then when I generate code, it should automatically use the SPE Thread
> > Library to use the SPUs.
> >
> > The question regarding the Memory Flow Controller (MFC) is quite
> > interesting. I have been toying with the idea of a <<MFCThread>> to
> > represent an conceptual thread of execution that maps to the
> > ability of the
> > DMA to execute background transfers. I also think, that <<MFCThread>>
> > objects would be used by both <<SPUThread>> and <<PPUThread>>
> > objects. Its
> > just an initial thought, I have to create a small working prototype
> > to see
> > how well it works and if its cenceptually correct from a modeling
> > perspective, etc.
>
> hmm, I don't see the MFC as a "thread" but as a resource (similar to
> a disk adapter), but the important part is that it is a shared
> resource where the accessors must have exclusive access when
> "programming" it tho it operates asynchronously.
Yes, maybe you're right. Let me develop the first prototype and then look at
how it all turns out and I'll send the model and examples across for review.
> >
> > So, in essence, I'm attempting to make it easier to program for the
> > CBE
> > using conceptual UML models, and using automatic code generation,
> > ensure
> > that the implementation is in sync with the intended design.
>
> cool.
> [snip]
>
> > However, the existing framework libraries will require a lot of
> > adaptations,
> > to support execution on the SPU target. This is primarily due to the
> > constrainted environment defined for the SPU. So, its more of removing
> > support for features of the UML object execution framework for the
> > SPU, like
> > iostream, etc.
>
> The SPU instruction set can be pretty fat, but you may want to look at:
> http://cxx.uclibc.org/
> for some code.
> [snip]
>
>
> >> Oh, Dude! this smells _way_ wrong. From what I understand, all you
> >> want is for an app to have a character driven communication channel
> >> to the machine running the simulator. Correct?
> >> Yes, with HW network is best, but there are _far_ more efficient ways
> >> to do this with systemsim then simulating an entire network stack.
> [snip]
>
> > If I had access to a real cell-blade server then there wouldnt be
> > much of an
> > issue. For the system simulator, apparently, they plan on releasing a
> > version of systemsim for the cell, very soon. So, once that comes
> > out, it
> > should be possible to animate on the target. I read somewhere about
> > the
> > tcp/ip stack simulation and how its far more efficient to bypass it or
> > something, but I hope the tcp/ip funtionality wont be affected.
>
> So you are tied to sockets? Why would the transport have anything to
> do with it?
> I cannot express how much faster the simulation will be using a non-
> network channel.
>
At this point in time, I'm just going by the default supported mechanism
provided by the UML modeling environment. Yes, I the object execution
framework uses TCP/IP sockets at present. Hmmm, I can see what you are
trying to suggest, that you use a socket implemetation, but it need not
necessarily be TCP/IP for the underlying transport, sort of like the
vse_subdivision workload example supplied as part of the cell simulator sdk.
Wow ! That's a great idea. This would however, require me to make changes to
some of the existing libraries. This is the first time I'm coming across
sockets that use a file , rather than TCP/IP, so I need to read more about
this . :-) ! I will certainly make a note of this and will try to
investigate this further, if I notice a severe performance degradation with
the simulation, and I think like u said, it just might happen :-)! I
remember reading somewhere about bypassing the TCP/IP protocol stack to
improve the speed of the simulation, or something like that.
> -JX
> --
> "I got an idea, an idea so smart my head would explode if I even
> began to know what I was talking about." -- Peter Griffin (Family
> Guy)
>
>
More information about the K42-discussion
mailing list