mpiexec/PBS + net_send: could not write to fd=...

Pete Wyckoff pw at osc.edu
Thu Aug 9 10:21:47 EDT 2007


bfp at purdue.edu wrote on Tue, 07 Aug 2007 14:07 -0400:
> I'm using mpiexec-0.82 with PBSPro. When running a simple "hello,world" 
> program, everything usually works fine, but occasionally (about 1 in 20 
> runs) I'll see the message:
> 
> hamlet-632 775% mpiexec -np 4 ./hellof
>  node           0 : Hello world!
>  node           1 : Hello world!
>  node           2 : Hello world!
>  node           3 : Hello world!
> rm_l_2_15688:  p4_error: interrupt SIGx: 13
> rm_l_2_15688: (0.035361) net_send: could not write to fd=7, errno = 32
> rm_l_2_15688: (0.035426) net_send: could not write to fd=9, errno = 32
> rm_l_2_15688: (0.035445) net_send: could not write to fd=10, errno = 32
> rm_l_2_15688: (0.035463) net_send: could not write to fd=11, errno = 32
> rm_l_2_15688: (0.035480) net_send: could not write to fd=7, errno = 32
> forrtl: error (69): process interrupted (SIGINT)
> Image              PC        Routine            Line        Source             
> hellof             080C4C43  Unknown               Unknown  Unknown
> hellof             080C4263  Unknown               Unknown  Unknown
> hellof             080A16FE  Unknown               Unknown  Unknown
> hellof             08093B4C  Unknown               Unknown  Unknown
> hellof             08080491  Unknown               Unknown  Unknown
> hellof             080737A0  Unknown               Unknown  Unknown
> hellof             0805F2F0  Unknown               Unknown  Unknown
> Unknown            00000001  Unknown               Unknown  Unknown
> 
> Stack trace terminated abnormally.
> 
> 
> 
> I never see the error when using "mpirun" as in
> 
> mpirun -np 4 -machinefile $PBS_NODEFILE ./hellof
> 
> 
> Does anyone know what this would be? Is it an mpiexec problem?

These p4 errors are from mpich.  One of the mpich processes cannot
talk to the others.  It could be that your hellof doesn't shutdown
cleanly with MPI_Finalize.  A possible reason why the situation
looks different with mpiexec vs mpirun is that mpiexec does its
business faster and the tasks end up more synchronized.  So any race
conditions show up differently.

You can do "mpiexec -v -v ..." to slow it down and see more of its
debugging output.  There are also debug flags to mpich, so "...
./hellof -p4dbg 99" to see what it is doing.  (Actually look that
up; I may not remember correctly.)

		-- Pete


More information about the mpiexec mailing list