Frustrating problem
Brent M. Clements
bclem at rice.edu
Fri Dec 17 06:32:47 EST 2004
Hi Pete et al,
We are having an issue right now that occurs everytime we run mpi jobs
using mpiexec 0.77. We are using mpich-1.2.6 with the mpich-p4 comm
The first time we run the mpi program using mpiexec
We get the following errors(and it's random processes/nodes each first
time)
Process 26 of 50 on n96.rtc
p26_1111: p4_error: Timeout in establishing connection to remote process:
0
Process 10 of 50 on n118.rtc
p10_1393: p4_error: Timeout in establishing connection to remote process:
0
Process 42 of 50 on n75.rtc
During the same job session, if we run the exact same command it runs
fine.(ie we run the exact command again right after the first time
command)
It's very wierd...and our users are starting to complain. At this point I
don't know what's causing the problem. Our systems analyst keeps pointing
to mpiexec(that's why I'm emailing the list).
Thanks,
Brent
More information about the mpiexec
mailing list