mpiexec patch for very large jobs
Maestas, Christopher Daniel
cdmaest at sandia.gov
Thu Sep 16 20:30:10 EDT 2004
Hello,
What is the current status of integrating this patch?
Regards,
- Chris
-----Original Message-----
From: Pete Wyckoff [mailto:pw at osc.edu]
Sent: Monday, May 03, 2004 1:20 PM
To: Alex
Cc: mpiexec at osc.edu
Subject: Re: mpiexec patch for very large jobs
korobka at nankai.edu.cn wrote on Mon, 03 May 2004 19:53 +0800:
> I encountered a problem where mpiexec would not work properly when
>
> 1. The number of file descriptors exceeded FD_SETSIZE.
> 2. write_full() in scatter_gm_startup_ports() returned -1 with errno
> of EAGAIN after a write to the connected nonblocking socket.
>
> First problem could be fixed either by recompiling the kernel and
> reinstalling it on all nodes or by replacing select() with poll() in
> the mpiexec source code, the second problem clearly needed better
> error handling in xxx_full() routines. Here is a patch for both
> problems. It worked here but it may need a bit more polishing.
Thanks much for this patch. I'll definitely include something like it in
the next release. A few questions for you, though, if you'll help me to
understand some of it.
Was it really necessary to grow the listen() backlog? System defaults tend
to be around 128, so unless you had to change this systemwide (e.g. via
/proc/sys/net/core/somaxconn on linux), 4096 should give the same behavior
as 1024. I can make that the default with a note about the system limit if
you think it makes sense.
I need to make sure poll() exists on most machines then will gut any
remaining select() use.
The second part of your patch is obviously the right thing to do. Sorry I
didn't deal with this correctly in the first place. It doesn't look
necessary to check EAGAIN in read_full(), though, since we only ever read
blocking sockets. And I'm tempted just to switch the fd to blocking before
the call to write_full(), maybe wrapped with an alarm() to avoid the
hang-on-dead-node scenario instead of the EAGAIN checking code you did.
Then I should do this to all the devices that need it, for completeness,
maybe abstracted out with some helper function for the asynchronous
connect() part.
Thanks again,
-- Pete
_______________________________________________
mpiexec mailing list
mpiexec at osc.edu
http://email.osc.edu/mailman/listinfo/mpiexec
More information about the mpiexec
mailing list