mpiexec architecture questions...
Pete Wyckoff
pw at osc.edu
Mon Feb 7 16:36:19 EST 2005
raysonho at eseenet.com wrote on Mon, 07 Feb 2005 13:06 -0800:
> I am trying to see how difficult it is to create a real TM library for
> Gridengine (SGE), so that SGE also can use mpiexec. I am reading the
> mpiexec code and have some questions:
>
> 1) from the mpiexec diagram (generated from the source), there's something
> called the "listener"... what is it, and what is it for??
[in mpiexec/proc-relations.fig or do "make proc-relations.ps" or similar
to get postscript]
This "listener" thing is only in the mpich1/p4 (i.e. sockets on ethernet)
version. It's the very unpleasant way they chose to implement multiple
tasks on a single node, i.e. for multi-processor machines. I would
suggest you concentrate only on mpich2 (sockets, shm, etc. device) and
not struggle with the mpich1/p4 listener stuff.
> 2) And also, why is there a difference between an MPICH/GM job and an
> MPICH/P4 job?? ("code M*p+0" is the root of code M*p+1...n for the P4
> case)
Same answer, more or less. The authors of the GM device in mpich1
implemented spawning correctly on SMP nodes, making your job as an
mpiexec designer much easier. There is no need for a M*p+0 node at the
top with the real worker nodes (and a listener) beneath. Just qrsh each
of the two tasks one at a time to the same node.
Good luck! If you get a nice design that separates out as a
configure-time option, let me know if you want to toss it in the mpiexec
source proper.
-- Pete
More information about the mpiexec
mailing list