why mpiexec run uncorrectly on smp?

Brooks Davis brooks at aero.org
Tue Jun 11 18:33:55 EDT 2002


On Tue, Jun 11, 2002 at 05:06:38AM -0700, Ben Webb wrote:
> On Mon, Jun 10, 2002 at 05:03:03PM -0700, Brooks Davis wrote:
> > I really wish mpiexec could support a stupid mode that used sockets for
> > all communication.
> 
> 	If you want to use LAM/MPI, I'm not stopping you! As far as I
> know, this is an MPICH limitation, not an mpiexec one; mpiexec just
> starts up the MPICH job.

I'm pretty sure it's not an mpich limitation because mpirun works
fine on one machine without comm=shared.

> > The shared mode is rather flaky on FreeBSD
> 
> 	On Linux, too, if by "flaky" you mean it leaves shared memory
> segments lying around after a crash... and the default P4_GLOBMEMSIZE is
> set rather too low for our typical calculations (so MPICH jobs keep
> running out of shared memory) but that's easily remedied.

Yah, that's it.  The diagnostics are totally useless when something goes
wrong.

> > but I can't get mpich with p4 and comm=shared working reliably enough to
> > inflict it on users.
> 
> 	That's understandable. We had a lot of teething problems with
> MPICH+mpiexec on our cluster too, but I haven't seen a job crash with
> bizarre MPICH errors for a month or more now.

That's good to hear.  I'l try again next week.  I think part of the issue
is that since I'm currently getting started, there are too many
variables when testing.

-- Brooks



More information about the mpiexec mailing list