mpiexec on a heterogenous cluster
Pete Wyckoff
pw at osc.edu
Wed Apr 27 14:51:20 EDT 2005
dumboken at gmail.com wrote on Wed, 27 Apr 2005 11:18 -0500:
> I am running mpiexec 0.79 along with mpich-1.2.5 and TORQUE pbs on a
> cluster containing both dual processor 36-bit intel machines and
> single processor AMD 64-bit bachines. The same executable can be used
> for all machines, but with mpiexec I have problems when trying to run
> on combinations of processors containing single processors of the dual
> processor machines or on single processor machines. The process
> simply dies as it cannot perform its initiation routines. Since the
> program being run is not memory intensive and spends <<1% of its time
> on communications, such a setup is often useful to take full advantage
> of our cluster. Such a system does work with mpirun, but Id rather
> use mpiexec. I have tried compiling our programs both with and
> without shared memory support as specified by compiler flags in the
> mailing list archives without success. Any help would be appreciated.
On reflection, this sounds like it should work: both machines can run
the 32-bit executables just fine, as you point out. The problem could
be in either the mpich startup code or mpiexec or pbs. Can you get
mpiexec to work on homogenous sets of either 32-bit or 64-bit machines,
with the same plop-mpi executable? Hopefully we can rely on that to
start.
Can you send me the mpiexec output/error when adding the command-line
flags "-v -v -v"? That may point in the direction of where the problem
may lie. I will read through the mpich/p4 code and see if there's anything
that relies on sizeof(void*) being the same on all machines.
-- Pete
More information about the mpiexec
mailing list