[K42-discussion] Re: Possible benefit of running K42 on Cell CBE ...
Jimi Xenidis
jimix at watson.ibm.com
Fri Dec 23 22:27:16 EST 2005
more indents, and some trimming
On Dec 23, 2005, at 12:57 AM, Elvis John Dowson wrote:
> Hi Jimi,
[snip]
> I'm new to this, so I was hoping I could get an answer from the K42
> forum.
> By sharing memory between cluster nodes, I mean in terms of pointer
> accessibility to the data in the local stores of the individual
> SPUs and to
> be able to access memory using a sort of global address space.
There are there are 5 scopes here:
1 the chip, a single CBE chip (single processor CBE Blade)
2 the SMP, 2 CBE chips (dual processor CBE Blade)
3 the cluster, several CBE Blades connected by some cluster
interconnect like Myrinet or Infiniband
with some MPI software
4 the DSM cluster, using the cluster scenario above with some DSM
SW to make it look like a unified address space.
5 the large SMP with some "mythical" switch, several CBE blades
interconnected with IBM SP or SGI CrayLink
technology.
1-2 are all currently possible of Linux and k42 (after the CPE port
in k42)
3 Popular in Linux, k42 is working
4 Used in Linux (see OpenMosix or Mosix projects) not even tried on k42.
5 Architecturally this is the same as 2, however K42 has many ccNUMA
features (from its origins) that can take
advantage of memory and processor locality.
Which do you want?
> I read
> references to the keywords 'cache coherent', 'scalable', 'hot
> swappable' ,
> etc, when reading through some of the articles and present work
> being done
> on K42, and I got the impression that one of the features of the
> K42 was
> that its modular and supports/or will support hot swapping of nodes
> in and
> out of the system.
[snip]
Sure K42 allows for these things, but (and I'm not the authority on
this) the "hot swappable" mainly refers to swapping SW components,
rather than HW components, and focuses on SMP rather than clusters.
[snip]
> If you have to built a cluster of say 64 nodes, wouldn't you
> require a high
> speed hardware interconnect. Older versions of SGI supercomputers were
> configured into a hypercube topology using what they called a CrayLink
> interconnect fabric, which I think was a hardware component.
We are very familiar with ccNUMA, since k42 was borne from such a
system.
However, few clusters use this type of technology. Most clusters
these days try to take advantage of commodity parts and cheap node
interconnects that communicate over MPI.
see:
http://www.bsc.org.es/index,en.html
http://www.top500.org/
Unless you have a bazzilian dollars :)
>
> Here are some excerpts from some links I got while search for "SGI
> Hypercube
> hardware"
>
> http://sc.jpl.nasa.gov/hardware/origin2000/using/
here is something more up to date.
http://www.cray.com/products/xt3/
[snip]
>
> Yes, I plan to generate some SPU specific C/C++ code from the UML
> models,
> using UML Stereotypes and UML Stereotype Tags. These specific
> extensions
> will be stored in what is called a UML Profile. So, in effect, if I
> draw a
> class diagram called 'DisplayProcessor' and stereotype using
> <<SPUThread>>,
> then when I generate code, it should automatically use the SPE Thread
> Library to use the SPUs.
>
> The question regarding the Memory Flow Controller (MFC) is quite
> interesting. I have been toying with the idea of a <<MFCThread>> to
> represent an conceptual thread of execution that maps to the
> ability of the
> DMA to execute background transfers. I also think, that <<MFCThread>>
> objects would be used by both <<SPUThread>> and <<PPUThread>>
> objects. Its
> just an initial thought, I have to create a small working prototype
> to see
> how well it works and if its cenceptually correct from a modeling
> perspective, etc.
hmm, I don't see the MFC as a "thread" but as a resource (similar to
a disk adapter), but the important part is that it is a shared
resource where the accessors must have exclusive access when
"programming" it tho it operates asynchronously.
>
> So, in essence, I'm attempting to make it easier to program for the
> CBE
> using conceptual UML models, and using automatic code generation,
> ensure
> that the implementation is in sync with the intended design.
cool.
[snip]
> However, the existing framework libraries will require a lot of
> adaptations,
> to support execution on the SPU target. This is primarily due to the
> constrainted environment defined for the SPU. So, its more of removing
> support for features of the UML object execution framework for the
> SPU, like
> iostream, etc.
The SPU instruction set can be pretty fat, but you may want to look at:
http://cxx.uclibc.org/
for some code.
[snip]
>> Oh, Dude! this smells _way_ wrong. From what I understand, all you
>> want is for an app to have a character driven communication channel
>> to the machine running the simulator. Correct?
>> Yes, with HW network is best, but there are _far_ more efficient ways
>> to do this with systemsim then simulating an entire network stack.
[snip]
> If I had access to a real cell-blade server then there wouldnt be
> much of an
> issue. For the system simulator, apparently, they plan on releasing a
> version of systemsim for the cell, very soon. So, once that comes
> out, it
> should be possible to animate on the target. I read somewhere about
> the
> tcp/ip stack simulation and how its far more efficient to bypass it or
> something, but I hope the tcp/ip funtionality wont be affected.
So you are tied to sockets? Why would the transport have anything to
do with it?
I cannot express how much faster the simulation will be using a non-
network channel.
-JX
--
"I got an idea, an idea so smart my head would explode if I even
began to know what I was talking about." -- Peter Griffin (Family
Guy)
More information about the K42-discussion
mailing list