why mpiexec run uncorrectly on smp?
Ben Webb
ben at bellatrix.pcl.ox.ac.uk
Tue Jun 11 08:06:38 EDT 2002
On Mon, Jun 10, 2002 at 05:03:03PM -0700, Brooks Davis wrote:
> I really wish mpiexec could support a stupid mode that used sockets for
> all communication.
If you want to use LAM/MPI, I'm not stopping you! As far as I
know, this is an MPICH limitation, not an mpiexec one; mpiexec just
starts up the MPICH job.
> The shared mode is rather flaky on FreeBSD
On Linux, too, if by "flaky" you mean it leaves shared memory
segments lying around after a crash... and the default P4_GLOBMEMSIZE is
set rather too low for our typical calculations (so MPICH jobs keep
running out of shared memory) but that's easily remedied.
> and since it uses SysV shared memory it does a really bad job of cleaning
> up after a crash.
LAM/MPI also uses SysV shared memory, but cleans up after itself
nicely, since it writes all of the necessary resource IDs into a state
file. You can find patches to do something similar for MPICH at
http://bellatrix.pcl.ox.ac.uk/~ben/pbs/.
> At this point I want to sue mpiexec
You can't, because it's GPL, and comes with no warranty. Oh. ;)
> but I can't get mpich with p4 and comm=shared working reliably enough to
> inflict it on users.
That's understandable. We had a lot of teething problems with
MPICH+mpiexec on our cluster too, but I haven't seen a job crash with
bizarre MPICH errors for a month or more now.
Ben
--
ben at bellatrix.pcl.ox.ac.uk http://bellatrix.pcl.ox.ac.uk/~ben/
"God runs electromagnetics by wave theory on Monday, Wednesday, and
Friday, and the Devil runs them by quantum theory on Tuesday, Thursday,
and Saturday."
- Sir William Bragg
More information about the mpiexec
mailing list