Problem with Torque/MPICH2/Mpiexec

Richard de Jong rjong at os3.nl
Thu Jun 29 09:38:37 EDT 2006


Hi list,

I just subscribed to this list because I have a problem.

I am trying to start an MPI application compiled with mpich2-1.0.3 using
mpiexec, to use the TM interface of Torque. It gives a strange error.
Please see my config below.

For mpiexec I have tried versions 0.80 and 0.81
For Torque I have tried versions 1.0.1p6 1.2.0p3 and 2.0.0p4

Any idea what goes wrong?

This setup does work when I use mpirun from MPICH2 instead of mpiexec.
It also works correctly when I compile with mpich-1.2.7p1 and run with
mpiexec.


My PBS job looks like:
------------------------------------------------------------------------
#!/bin/bash

echo "
#!/bin/bash
# PBS job wrapper generated by `basename $0`
# on `/bin/date`
#
# PBS directives:
#PBS -S /bin/bash
#PBS -q dteam
#PBS -l nodes=3
#PBS -W
stagein=mpich2-pbs-wrapper.sh at lxb1405.cern.ch:/home/dteam013/mpich2-pbs-wrapper.sh
#PBS -m n
#PBS -V

~/mpich2-pbs-wrapper.sh MPItest
" | qsub
------------------------------------------------------------------------


mpich2-pbs-wrapper.sh is a script that compiles the MPItest.c program,
copies it to all the nodes in $PBS_NODEFILE and then executes the
program with "mpiexec -verbose `pwd`/MPItest"


MPItest.c is
------------------------------------------------------------------------
/*  hello.c
 *
 *  Simple "Hello World" program in MPI.
 *
 */

#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
  int numprocs;  /* Number of processors */
  int procnum;   /* Processor number */
  /* Initialize MPI */
  MPI_Init(&argc, &argv);
  /* Find this processor number */
  MPI_Comm_rank(MPI_COMM_WORLD, &procnum);
  /* Find the number of processors */
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  printf ("Hello world! from processor %d out of %d\n", procnum, numprocs);
  /* Shut down MPI */
  MPI_Finalize();
  return 0;
}
------------------------------------------------------------------------



Standard Output gives:
------------------------------------------------------------------------
Hello world! from processor 0 out of 1
------------------------------------------------------------------------


Standard Error gives:
------------------------------------------------------------------------
mpiexec: resolve_exe: using absolute path "/home/dteam013/MPItest".
mpiexec: process_start_event: evt 2 task 0 on lxb1929.cern.ch.
mpiexec: read_p4_master_port: waiting for port from master.
mpiexec: process_obit_event: evt 3 task 0 on lxb1929.cern.ch stat 0.
mpiexec: read_p4_master_port: got port -1.
mpiexec: kill_tasks: killing all tasks.
mpiexec: Warning: task 0 exited before completing MPI startup.
mpiexec: Warning: task 1 died with signal 1701668980 (Unknown signal
1701668980).
mpiexec: Warning: task 2 exited oddly---report bug: status 0 done 0.
------------------------------------------------------------------------


I would have expected the standard error to be empty, and standard
output to contain:

------------------------------------------------------------------------
Hello world! from processor 0 out of 3
Hello world! from processor 1 out of 3
Hello world! from processor 2 out of 3
------------------------------------------------------------------------



When compiled against mpich-1.2.7, the standard error using -verbose
gives (what seems normal):
------------------------------------------------------------------------
mpiexec: resolve_exe: using absolute path "/home/dteam013/MPItest".
mpiexec: process_start_event: evt 2 task 0 on lxb1929.cern.ch.
mpiexec: read_p4_master_port: waiting for port from master.
mpiexec: read_p4_master_port: got port 38496.
mpiexec: process_start_event: evt 4 task 1 on lxb1929.cern.ch.
mpiexec: process_start_event: evt 5 task 2 on lxb1405.cern.ch.
mpiexec: All 3 tasks (spawn 0) started.
mpiexec: wait_tasks: waiting for lxb1929.cern.ch and 2 others.
mpiexec: process_obit_event: evt 3 task 0 on lxb1929.cern.ch stat 0.
mpiexec: process_obit_event: evt 6 task 1 on lxb1929.cern.ch stat 0.
mpiexec: wait_tasks: waiting for lxb1405.cern.ch.
mpiexec: process_obit_event: evt 7 task 2 on lxb1405.cern.ch stat 0.
------------------------------------------------------------------------
And standard output shows the desired result.


Thanks in advance,

Richard de Jong




More information about the mpiexec mailing list