mpiexec patch for very large jobs
Pete Wyckoff
pw at osc.edu
Mon May 3 15:19:39 EDT 2004
korobka at nankai.edu.cn wrote on Mon, 03 May 2004 19:53 +0800:
> I encountered a problem where mpiexec would not work properly when
>
> 1. The number of file descriptors exceeded FD_SETSIZE.
> 2. write_full() in scatter_gm_startup_ports() returned -1 with errno
> of EAGAIN after a write to the connected nonblocking socket.
>
> First problem could be fixed either by recompiling the kernel and reinstalling
> it on all nodes or by replacing select() with poll() in the mpiexec source code,
> the second problem clearly needed better error handling in xxx_full() routines.
> Here is a patch for both problems. It worked here but it may need a bit more
> polishing.
Thanks much for this patch. I'll definitely include something like it
in the next release. A few questions for you, though, if you'll help
me to understand some of it.
Was it really necessary to grow the listen() backlog? System defaults
tend to be around 128, so unless you had to change this systemwide (e.g. via
/proc/sys/net/core/somaxconn on linux), 4096 should give the same
behavior as 1024. I can make that the default with a note about the
system limit if you think it makes sense.
I need to make sure poll() exists on most machines then will gut any
remaining select() use.
The second part of your patch is obviously the right thing to do. Sorry
I didn't deal with this correctly in the first place. It doesn't look
necessary to check EAGAIN in read_full(), though, since we only ever read
blocking sockets. And I'm tempted just to switch the fd to blocking
before the call to write_full(), maybe wrapped with an alarm() to avoid
the hang-on-dead-node scenario instead of the EAGAIN checking code you
did.
Then I should do this to all the devices that need it, for completeness,
maybe abstracted out with some helper function for the asynchronous
connect() part.
Thanks again,
-- Pete
More information about the mpiexec
mailing list