mpiexec/PBS + net_send: could not write to fd=...
Pete Wyckoff
pw at osc.edu
Thu Aug 9 10:21:47 EDT 2007
bfp at purdue.edu wrote on Tue, 07 Aug 2007 14:07 -0400:
> I'm using mpiexec-0.82 with PBSPro. When running a simple "hello,world"
> program, everything usually works fine, but occasionally (about 1 in 20
> runs) I'll see the message:
>
> hamlet-632 775% mpiexec -np 4 ./hellof
> node 0 : Hello world!
> node 1 : Hello world!
> node 2 : Hello world!
> node 3 : Hello world!
> rm_l_2_15688: p4_error: interrupt SIGx: 13
> rm_l_2_15688: (0.035361) net_send: could not write to fd=7, errno = 32
> rm_l_2_15688: (0.035426) net_send: could not write to fd=9, errno = 32
> rm_l_2_15688: (0.035445) net_send: could not write to fd=10, errno = 32
> rm_l_2_15688: (0.035463) net_send: could not write to fd=11, errno = 32
> rm_l_2_15688: (0.035480) net_send: could not write to fd=7, errno = 32
> forrtl: error (69): process interrupted (SIGINT)
> Image PC Routine Line Source
> hellof 080C4C43 Unknown Unknown Unknown
> hellof 080C4263 Unknown Unknown Unknown
> hellof 080A16FE Unknown Unknown Unknown
> hellof 08093B4C Unknown Unknown Unknown
> hellof 08080491 Unknown Unknown Unknown
> hellof 080737A0 Unknown Unknown Unknown
> hellof 0805F2F0 Unknown Unknown Unknown
> Unknown 00000001 Unknown Unknown Unknown
>
> Stack trace terminated abnormally.
>
>
>
> I never see the error when using "mpirun" as in
>
> mpirun -np 4 -machinefile $PBS_NODEFILE ./hellof
>
>
> Does anyone know what this would be? Is it an mpiexec problem?
These p4 errors are from mpich. One of the mpich processes cannot
talk to the others. It could be that your hellof doesn't shutdown
cleanly with MPI_Finalize. A possible reason why the situation
looks different with mpiexec vs mpirun is that mpiexec does its
business faster and the tasks end up more synchronized. So any race
conditions show up differently.
You can do "mpiexec -v -v ..." to slow it down and see more of its
debugging output. There are also debug flags to mpich, so "...
./hellof -p4dbg 99" to see what it is doing. (Actually look that
up; I may not remember correctly.)
-- Pete
More information about the mpiexec
mailing list