Problem with Torque/MPICH2/Mpiexec

David Golden dgolden at cp.dias.ie
Thu Jun 29 10:14:17 EDT 2006


On 2006-06-29 15:38:37 +0200, Richard de Jong wrote:
> Hi list,
> 
> I just subscribed to this list because I have a problem.
> 
> I am trying to start an MPI application compiled with mpich2-1.0.3 using
> mpiexec, to use the TM interface of Torque. It gives a strange error.
> Please see my config below.
> 
> For mpiexec I have tried versions 0.80 and 0.81
> For Torque I have tried versions 1.0.1p6 1.2.0p3 and 2.0.0p4
> 

> Any idea what goes wrong?
>

Not really, but for the record, here's some compile flags we use
for mpich2 1.0.3 (with mpiexec 0.81 and torque 2.1.0p0 ,gcc 3, RHEL4)
- the pm / pmi stuff is the relevant stuff - osc-mpiexec emulates a
particular mpich2 mpd+pmi setup. If you're not using the
pm + pmi settings as below, I expect you'd see funny problems, no
idea if _your_ problems correspond though.

/usr/local/src/mpich2-1.0.3/configure --enable-fast --with-thread-package \
--enable-f77 --disable-f90 --enable-cxx --enable-romio --enable-pmiport \
--with-mpe --with-pm=mpd --with-pmi=simple

- then disable mpich2's mpiexec (and probably un-suid-root/disable their
mpdroot binary if you installed as root, it's unnecessary and the less
suid root stuff hanging about the better!) and use osc-mpiexec 
instead,  obviously.

Other points:  Try torque 2.0.0p8 if 2.1.x still seems a bit bleeding
edge, the early 2.0.0 releases were a bit dodgy IIRC. 2.0.0p8 worked
fine for us for a while, though 2.1.x's new build system and 
rationalised  pbs-config /libtorque library is much nicer.



More information about the mpiexec mailing list