mpiexec on a heterogenous cluster
Ken Borrelli
dumboken at gmail.com
Wed Apr 27 12:18:28 EDT 2005
I am running mpiexec 0.79 along with mpich-1.2.5 and TORQUE pbs on a
cluster containing both dual processor 36-bit intel machines and
single processor AMD 64-bit bachines. The same executable can be used
for all machines, but with mpiexec I have problems when trying to run
on combinations of processors containing single processors of the dual
processor machines or on single processor machines. The process
simply dies as it cannot perform its initiation routines. Since the
program being run is not memory intensive and spends <<1% of its time
on communications, such a setup is often useful to take full advantage
of our cluster. Such a system does work with mpirun, but Id rather
use mpiexec. I have tried compiling our programs both with and
without shared memory support as specified by compiler flags in the
mailing list archives without success. Any help would be appreciated.
Thanks in advance,
Ken
Error message reported by mpiexec
OUTPUT:
mpiexec: resolve_exe: using absolute exe "/home/ken/plop_par/plop-mpi".
mpiexec: read_p4_master_port: waiting for port from master.
mpiexec: read_p4_master_port: got port 39315.
mpiexec: All 2 tasks started.
mpiexec: wait_tasks: numspawned = 2, got evt 5 for tid 3 host
compute-0-11.local status 1.
mpiexec: wait_tasks: numspawned = 1, got evt 3 for tid 2 host
compute-0-15.local status 1.
mpiexec: Warning: tasks 0-1 exited with status 1.
ERROR
ompute-0-15.local
Wed Apr 27 11:16:04 CDT 2005
rm_2504: p4_error: semget failed for setnum: 0
p0_3619: (0.300263) net_recv failed for fd = 5
p0_3619: p4_error: net_recv read, errno = : 104
p0_3619: (2.326237) net_send: could not write to fd=4, errno = 32
Wed Apr 27 11:16:07 CDT 2005
More information about the mpiexec
mailing list