why mpiexec run uncorrectly on smp?
Ben Webb
ben at bellatrix.pcl.ox.ac.uk
Wed Jun 12 06:37:30 EDT 2002
On Tue, Jun 11, 2002 at 03:33:55PM -0700, Brooks Davis wrote:
> I'm pretty sure it's not an mpich limitation because mpirun works
> fine on one machine without comm=shared.
Ah, but as I understand it, mpiexec uses MPICH's execer
interface to start the processes, and mpirun doesn't. I guess the
limitation is in there somewhere.
> > On Linux, too, if by "flaky" you mean it leaves shared memory
> > segments lying around after a crash... and the default P4_GLOBMEMSIZE is
> > set rather too low for our typical calculations (so MPICH jobs keep
> > running out of shared memory) but that's easily remedied.
>
> Yah, that's it. The diagnostics are totally useless when something goes
> wrong.
The two main errors we used to see were
"xx_shmalloc: returning NULL" which generally means P4_GLOBMEMSIZE is
set too low... and
"OOPS: shmat failed" which usually means it's set too high.
In the end I changed the default allocation of shared memory from 4MB to
16MB, which suffices for all the calculations that we run here. (The
change is to the define of GLOBMEMSIZE in mpid/ch_p4/p4/lib/p4_MD.h, at
around line 436.) You can also override this on a job-by-job basis by
setting the P4_GLOBMEMSIZE environment variable, but this is kind of
fiddly.
Ben
--
ben at bellatrix.pcl.ox.ac.uk http://bellatrix.pcl.ox.ac.uk/~ben/
"I believe we are on an irreversible trend toward more freedom and
democracy - but that could change."
- Vice President Dan Quayle, 5/22/89
More information about the mpiexec
mailing list