core dump with gm-shared memory
Pete Wyckoff
pw at osc.edu
Tue Jul 1 15:47:18 EDT 2003
beaneg at umcs.maine.edu said on Thu, 26 Jun 2003 10:06 -0400:
> if I build mpiexec to use gm-shmem on SMP nodes, mpiexec causes a
> segmentation fault, but it is always after my MPI program has finished
> properly, so it seems to be when mpiexec is cleaning up.
>
> If I build mpiexec without gm-shmem there are no problems.
>
> gm-shmem has been changed slighly on my system. After discussing some
> problems with myricom we decided to change the default location of the
> shared memory file on our system(done by editing gmpi_smppriv.c and
> mpirun.ch_gm.pl). Since /tmp was NFS mounted, we were having problems
> with a large number of nodes writing shared memory files to /tmp. The
> shared memory file is now located in ramdisk( location of the shared
> memory file will likely be a configurable option in the next MPICH/GM
> release)
>
> This setup works fine with mpirun.ch_gm, but has been causing
> segmentation faults with mpiexec which don't seem to affect the actual
> MPI program.
>
> Since mpirun.ch_gm.pl references the temp file, I was wondering if
> mpiexec did anywhere, but looking quickly through the source code I
> didn't find any reference to it.
>
>
> Does anyone know what might be causing the problem? Other than the
> inability to use gm-shmem, we really like mpiexec so far.
I'm a bit confused by this. Release 0.72 of mpiexec and earlier did
have a configure option "--disable-gm-shmem" which could be used to
control the ability to use a command-line setting "-no-shmem" which
only changed the environment to contain "GMPI_SHMEM=0".
This was removed since it is just as easy to do something like:
export GMPI_SHMEM=0
mpiexec a.out
in your batch script and have the same effect. There are plenty of
other GPMI_ variables that can be set this way too.
Back before Aug 2002, it was necessary for mpiexec to think about the
path to the mpich/gm shared memory file, but that too is currently
handled only by the mpich library. Mpiexec does not choose a location
for the shared memory file or get involved in the process at all. In
fact, I don't think that mpiexec ever messes with /tmp unless you told
it your executable is there.
I can't guess at what would cause mpiexec itself to SEGV, then, since
all it talks to is PBS through the TM interface. It is not linked with
any MPICH or GM code. If you can run mpiexec under gdb and get it to
segv, I'd definitely like to see what caused it to die.
-- Pete
More information about the mpiexec
mailing list