Problem with Torque/MPICH2/Mpiexec
Richard de Jong
rjong at os3.nl
Thu Jun 29 09:38:37 EDT 2006
Hi list,
I just subscribed to this list because I have a problem.
I am trying to start an MPI application compiled with mpich2-1.0.3 using
mpiexec, to use the TM interface of Torque. It gives a strange error.
Please see my config below.
For mpiexec I have tried versions 0.80 and 0.81
For Torque I have tried versions 1.0.1p6 1.2.0p3 and 2.0.0p4
Any idea what goes wrong?
This setup does work when I use mpirun from MPICH2 instead of mpiexec.
It also works correctly when I compile with mpich-1.2.7p1 and run with
mpiexec.
My PBS job looks like:
------------------------------------------------------------------------
#!/bin/bash
echo "
#!/bin/bash
# PBS job wrapper generated by `basename $0`
# on `/bin/date`
#
# PBS directives:
#PBS -S /bin/bash
#PBS -q dteam
#PBS -l nodes=3
#PBS -W
stagein=mpich2-pbs-wrapper.sh at lxb1405.cern.ch:/home/dteam013/mpich2-pbs-wrapper.sh
#PBS -m n
#PBS -V
~/mpich2-pbs-wrapper.sh MPItest
" | qsub
------------------------------------------------------------------------
mpich2-pbs-wrapper.sh is a script that compiles the MPItest.c program,
copies it to all the nodes in $PBS_NODEFILE and then executes the
program with "mpiexec -verbose `pwd`/MPItest"
MPItest.c is
------------------------------------------------------------------------
/* hello.c
*
* Simple "Hello World" program in MPI.
*
*/
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
int numprocs; /* Number of processors */
int procnum; /* Processor number */
/* Initialize MPI */
MPI_Init(&argc, &argv);
/* Find this processor number */
MPI_Comm_rank(MPI_COMM_WORLD, &procnum);
/* Find the number of processors */
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
printf ("Hello world! from processor %d out of %d\n", procnum, numprocs);
/* Shut down MPI */
MPI_Finalize();
return 0;
}
------------------------------------------------------------------------
Standard Output gives:
------------------------------------------------------------------------
Hello world! from processor 0 out of 1
------------------------------------------------------------------------
Standard Error gives:
------------------------------------------------------------------------
mpiexec: resolve_exe: using absolute path "/home/dteam013/MPItest".
mpiexec: process_start_event: evt 2 task 0 on lxb1929.cern.ch.
mpiexec: read_p4_master_port: waiting for port from master.
mpiexec: process_obit_event: evt 3 task 0 on lxb1929.cern.ch stat 0.
mpiexec: read_p4_master_port: got port -1.
mpiexec: kill_tasks: killing all tasks.
mpiexec: Warning: task 0 exited before completing MPI startup.
mpiexec: Warning: task 1 died with signal 1701668980 (Unknown signal
1701668980).
mpiexec: Warning: task 2 exited oddly---report bug: status 0 done 0.
------------------------------------------------------------------------
I would have expected the standard error to be empty, and standard
output to contain:
------------------------------------------------------------------------
Hello world! from processor 0 out of 3
Hello world! from processor 1 out of 3
Hello world! from processor 2 out of 3
------------------------------------------------------------------------
When compiled against mpich-1.2.7, the standard error using -verbose
gives (what seems normal):
------------------------------------------------------------------------
mpiexec: resolve_exe: using absolute path "/home/dteam013/MPItest".
mpiexec: process_start_event: evt 2 task 0 on lxb1929.cern.ch.
mpiexec: read_p4_master_port: waiting for port from master.
mpiexec: read_p4_master_port: got port 38496.
mpiexec: process_start_event: evt 4 task 1 on lxb1929.cern.ch.
mpiexec: process_start_event: evt 5 task 2 on lxb1405.cern.ch.
mpiexec: All 3 tasks (spawn 0) started.
mpiexec: wait_tasks: waiting for lxb1929.cern.ch and 2 others.
mpiexec: process_obit_event: evt 3 task 0 on lxb1929.cern.ch stat 0.
mpiexec: process_obit_event: evt 6 task 1 on lxb1929.cern.ch stat 0.
mpiexec: wait_tasks: waiting for lxb1405.cern.ch.
mpiexec: process_obit_event: evt 7 task 2 on lxb1405.cern.ch stat 0.
------------------------------------------------------------------------
And standard output shows the desired result.
Thanks in advance,
Richard de Jong
More information about the mpiexec
mailing list