mpiexec patch for very large jobs

Alex korobka at nankai.edu.cn
Thu Sep 16 23:19:44 EDT 2004


I have an update to this, it fixes a corner case. I will send it next week
after I get back to the office.

On the other hand, is there anything in the works to support OSU MVAPICH stack?
If not then I'll have a go at it next week.

Alex

ÔÚÄúµÄÀ´ÐÅÖÐÔø¾­Ìáµ½:
>From: "Maestas, Christopher Daniel" <cdmaest at sandia.gov>
>Reply-To: 
>To: "'Pete Wyckoff'" <pw at osc.edu>,
Alex <korobka at nankai.edu.cn>
>Subject: RE: mpiexec patch for very large jobs
>Date:Thu, 16 Sep 2004 18:30:10 -0600
>
>Hello,
> 
> What is the current status of integrating this patch?
> 
> Regards,
> - Chris
> 
> 
>korobka at nankai.edu.cn wrote on Mon, 03 May 2004 19:53 +0800:
>> I encountered a problem where mpiexec would not work properly when
>> 
>> 1. The number of file descriptors exceeded FD_SETSIZE.
>> 2. write_full() in scatter_gm_startup_ports() returned -1 with errno
>>    of EAGAIN after a write to the connected nonblocking socket.
>> 
>> First problem could be fixed either by recompiling the kernel and 
>> reinstalling it on all nodes or by replacing select() with poll() in 
>> the mpiexec source code, the second problem clearly needed better 
>> error handling in xxx_full() routines. Here is a patch for both 
>> problems. It worked here but it may need a bit more polishing.
>
>Thanks much for this patch.  I'll definitely include something like it in
>the next release.  A few questions for you, though, if you'll help me to
>understand some of it.
>
>Was it really necessary to grow the listen() backlog?  System defaults tend
>to be around 128, so unless you had to change this systemwide (e.g. via
>/proc/sys/net/core/somaxconn on linux), 4096 should give the same behavior
>as 1024.  I can make that the default with a note about the system limit if
>you think it makes sense.
>
>I need to make sure poll() exists on most machines then will gut any
>remaining select() use.
>
>The second part of your patch is obviously the right thing to do.  Sorry I
>didn't deal with this correctly in the first place.  It doesn't look
>necessary to check EAGAIN in read_full(), though, since we only ever read
>blocking sockets.  And I'm tempted just to switch the fd to blocking before
>the call to write_full(), maybe wrapped with an alarm() to avoid the
>hang-on-dead-node scenario instead of the EAGAIN checking code you did.
>
>Then I should do this to all the devices that need it, for completeness,
>maybe abstracted out with some helper function for the asynchronous
>connect() part.
>
>Thanks again,
>
>		-- Pete
>_______________________________________________
>mpiexec mailing list
>mpiexec at osc.edu
>http://email.osc.edu/mailman/listinfo/mpiexec
>






More information about the mpiexec mailing list