Problems with mpiexec
Brent Clements
bclem at rice.edu
Thu May 8 09:41:19 EDT 2003
Hi Pete and everyone.
I'm trying to determine if I have a problem with OpenPBS or mpiexec.
We are using the latest mpiexec version with the latest mpich-gm from
myrinet(1.2.5..10).
Here is our situation.
When we run under pbs with the command
qsub -I -l nodes=50:ppn=2 that gives us 100 processors to work with(our
cluster is actually 100 dual processor nodes but this is just an
example)
I run the command ./mpiexec ./hello (using the hello program from the
mpiexec src)
Well it will just pause and then finally give the following output
mpiexec: Warning: main: task 0 died with signal 9.
mpiexec: Warning: main: task 1 died with signal 9.
mpiexec: Warning: main: task 2 died with signal 9.
mpiexec: Warning: main: task 3 died with signal 9.
mpiexec: Warning: main: task 99 died with signal 9.
I've run the mpiexec command with the -n parameter like this
/mpiexec -n 40 ./hello (it works)
/mpiexec -n 80 ./hello (it works)
but when I run ./mpiexec -n 98 (it dies)
Here is the output from that command where it dies using the -v flag
wait_tasks: numspawned = 99, got evt 199 for tid 1225 host n40 status
267
wait_tasks: task 98 tid 1225 stray obit 0 while waiting for kill 199
Can Anyone help??
Thanks,
Brent Clements
Rice University
More information about the mpiexec
mailing list