Mpiexec fails
Jaime E Combariza
jaime.e.combariza at Dartmouth.EDU
Mon Mar 10 12:51:57 EDT 2008
Hi.
I am trying to run a code that uses about 2 GB RAM per process. I am using
44 processors.
1 - I use mpiexec (0.82), the code aborts after a few minutes with:
mpiexec: Warning: tasks 0-43 died with signal 11 (Segmentation fault).
2 - I compiled v 0.83 and tried to run it I got several messages:
open: Read-only file system
[38] Abort: [38] smpi_init:error in opening shared memory file
</tmp/ib_shmem-117072-compute-1-18.local-583.tmp>: 29
at line 754 in file mpid/vapi/mpid_smpi.c
mpiexec: Warning: accept_abort_conn: MPI_Abort from IP 172.18.128.237, rank
38, killing all.
open: Read-only file system
[39] Abort: [39] smpi_init:error in opening shared memory file
</tmp/ib_shmem-117072-compute-1-18.local-583.tmp>: 29
at line 754 in file mpid/vapi/mpid_smpi.c
Note: mpiexec does run with other codes and even with the same code but with
less memory demands.
3 - I am running over IB so if I use mpirun_ssh (or rsh) the code runs fine.
Mpiexec -h:
Version 0.82, configure options: '--prefix=/software/mpiexec/0.82'
'--with-default-comm=ib' '--with-pbs=/usr/local/torque/current
4 - We are seeing similar problems when the code is run over GigE.
Any ideas?
Thanks
--
Jaime E Combariza, Ph.D.
Associate Director of Research Computing
Baker/Berry 179H
(603) 646-1506
More information about the mpiexec
mailing list