Mpiexec fails

Jaime E Combariza jaime.e.combariza at Dartmouth.EDU
Mon Mar 10 12:51:57 EDT 2008


Hi.

I am trying to run a code that uses about 2 GB RAM per process. I am using
44 processors.


1 - I use mpiexec (0.82), the code aborts after a few minutes with:

mpiexec: Warning: tasks 0-43 died with signal 11 (Segmentation fault).

2 - I compiled v 0.83 and tried to run it I got several messages:

open: Read-only file system
[38] Abort: [38] smpi_init:error in opening shared memory file
</tmp/ib_shmem-117072-compute-1-18.local-583.tmp>: 29
 at line 754 in file mpid/vapi/mpid_smpi.c
mpiexec: Warning: accept_abort_conn: MPI_Abort from IP 172.18.128.237, rank
38, killing all.
open: Read-only file system
[39] Abort: [39] smpi_init:error in opening shared memory file
</tmp/ib_shmem-117072-compute-1-18.local-583.tmp>: 29
 at line 754 in file mpid/vapi/mpid_smpi.c


Note: mpiexec does run with other codes and even with the same code but with
less memory demands.

3 - I am running over IB so if I use mpirun_ssh (or rsh) the code runs fine.



Mpiexec -h:

Version 0.82, configure options: '--prefix=/software/mpiexec/0.82'
'--with-default-comm=ib' '--with-pbs=/usr/local/torque/current

4 - We are seeing similar problems when the code is run over GigE.

Any ideas?

Thanks




-- 
Jaime E Combariza, Ph.D.
Associate Director of Research Computing
Baker/Berry 179H
(603) 646-1506


 




More information about the mpiexec mailing list