mpiexec patch for very large jobs

Alex korobka at nankai.edu.cn
Mon May 3 07:53:00 EDT 2004


Hi,

I encountered a problem where mpiexec would not work properly when

1. The number of file descriptors exceeded FD_SETSIZE.
2. write_full() in scatter_gm_startup_ports() returned -1 with errno
   of EAGAIN after a write to the connected nonblocking socket.

First problem could be fixed either by recompiling the kernel and reinstalling
it on all nodes or by replacing select() with poll() in the mpiexec source code,
the second problem clearly needed better error handling in xxx_full() routines. 
Here is a patch for both problems. It worked here but it may need a bit more 
polishing.

Thanks for mpiexec,
Alex Korobka


-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpiexec-0.76.poll.patch.gz
Type: application/x-gzip-compressed
Size: 4916 bytes
Desc: not available
Url : http://email.osc.edu/pipermail/mpiexec/attachments/20040503/4b9cf42a/mpiexec-0.76.poll.patch.bin


More information about the mpiexec mailing list