patch for lam version 6.5.8

Jeff Squyres jsquyres at lam-mpi.org
Thu Mar 27 22:11:06 EST 2003


On Thu, 27 Mar 2003, Pete Wyckoff wrote:

> Great.  I hope the authors of that know they are welcome to plunder any
> bits from mpiexec that might be helpful, particularly how to deal with
> the recalcitrant TM layer and ideas on handling stdio streams.  Let me
> know when you think that the LAM-PBS interface is solid and I'll add
> some comments about that in the documentation.

<lurk mode: off>

Thanks.

We've actually looked at mpiexec both for TM ideas and for mpiexec ideas,
both of which which will be included in the forthcoming LAM 7.0.  If all
goes well, LAM 7.0 should be stable around May or so.

As for buggy TM code -- woof -- we're well aware of that.  :-)  LAM
actually already handles all of of forwarding IO streams through the LAM
daemons.  File descriptor passing is probably one of the most un-portable
things in POSIX; what a pain!  Indeed, LAM 6.5.9 was largely caused by
updates in the fd-passing code for AMD/Hammer and POSIX.1g support (IIRC,
things broke horribly for BSD-style passing in 64 bit environments).

It's still CVS, so it's not perfect yet :-).  For example, we still have
two known issues: 1) the rsh module is currently broken in TM environments
(i.e., if you chose to force to use rsh in a PBS job instead of TM), and
2) PBSPro makes a $TMPDIR for the job, but it apparently only does this on
the mother superior's node, not all nodes in the job -- we still have a
few issues with how to handle that correctly.

That being said, we'd love to have any of you give LAM's TM interface a
whirl; nightly CVS snapshots are available from
http://www.lam-mpi.org/cvs/, or you can directly get an anonymous CVS
checkout (although it's a little harder to build).  Just to note: we stuck
with the lamboot/mpirun/lamhalt model -- lamboot itself (and friends)
became TM-aware.  Actually, what really happened is that the back-end of
lamboot (and friends) became modular so that adapting lamboot to a new
environment simply means writing a module to a specific API; the decision
which to use can be made at run time.  For example, in CVS, we currently
have 3 boot modules: rsh/ssh, tm, and bproc.  See lamssi_boot(7) for
details.

Nothing in our mpiexec implementation has been committed to CVS yet, but
it might well have a "one shot" kind of option that will implicitly do a
lamboot, mpirun, and lamhalt.  We'll see how it plays out; we have a code
review of mpiexec next week, so it might show up in CVS in the near
future.

<lurk mode: on>

-- 
{+} Jeff Squyres
{+} jsquyres at lam-mpi.org
{+} http://www.lam-mpi.org/



More information about the mpiexec mailing list