Problems with mpiexec

Brent Clements bclem at rice.edu
Thu May 8 09:41:19 EDT 2003


Hi Pete and everyone.

I'm trying to determine if I have a problem with OpenPBS or mpiexec.

We are using the latest mpiexec version with the latest mpich-gm from
myrinet(1.2.5..10).

Here is our situation.

When we run under pbs with the command

qsub -I -l nodes=50:ppn=2 that gives us 100 processors to work with(our
cluster is actually 100 dual processor nodes but this is just an
example)

I run the command ./mpiexec ./hello  (using the hello program from the
mpiexec src)

Well it will just pause and then finally give the following output

mpiexec: Warning: main: task 0 died with signal 9.
mpiexec: Warning: main: task 1 died with signal 9.
mpiexec: Warning: main: task 2 died with signal 9.
mpiexec: Warning: main: task 3 died with signal 9.
 
 
mpiexec: Warning: main: task 99 died with signal 9.


I've run the mpiexec command with the -n parameter like this
/mpiexec -n 40 ./hello (it works)
/mpiexec -n 80 ./hello (it works)

but when I run ./mpiexec -n 98 (it dies)

Here is the output from that command where it dies using the -v flag

wait_tasks: numspawned = 99, got evt 199 for tid 1225 host n40 status
267
wait_tasks: task 98 tid 1225 stray obit 0 while waiting for kill 199


Can Anyone help??

Thanks,
Brent Clements
Rice University







More information about the mpiexec mailing list