mpiexec problem.
Steve Young
slyoung at hamilton.edu
Wed Jun 20 16:46:14 EDT 2007
Hello,
I am having an issue with using mpiexec version 0.82. I am also using
mpich version 2-1.0.5. If I use the mpiexec or mpirun programs from
mpich2 then my applications work as expected. However, I am also using
torque-2.0.0p7. So in order to get torque to work properly with mpich2 I
was suggested I try using the OSC version of mpiexec. When I do things
seem to work in terms of spawning processes on the proper nodes in the
proper amount. However, it appears that none of the mpi daemons are
talking to one another. Like for an 8 cpu job it spawns 8 processes but
they all appear to run serially. Here is an example of using the
commands with the mpich2 cpi program:
% mpiexec -np 8 ./cpi
Process 0 of 1 is on node0038
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000107
Process 0 of 1 is on node0038
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000101
Process 0 of 1 is on node0038
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000118
Process 0 of 1 is on node0038
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000107
Process 0 of 1 is on node0037
Process 0 of 1 is on node0037
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000108
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000126
Process 0 of 1 is on node0037
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000102
Process 0 of 1 is on node0037
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000101
mpiexec: Warning: tasks 0-7 exited before completing MPI startup.
Now when I start up a ring on the same two nodes and run it using mpirun
I get:
% mpirun -np 8 ./cpi
Process 0 of 8 is on node0037
Process 1 of 8 is on node0038
Process 2 of 8 is on node0038
Process 3 of 8 is on node0038
Process 5 of 8 is on node0037
Process 4 of 8 is on node0038
Process 6 of 8 is on node0038
Process 7 of 8 is on node0037
pi is approximately 3.1415926544231247, Error is 0.0000000008333316
wall clock time = 0.010308
[clutest at herc0037 ~/cpi_test]%
The osc mpiexec spawns 8 seperate processes while mpirun does them in
parallel. Any ideas what I could try to remedy this? Thanks,
-Steve
More information about the mpiexec
mailing list