mpiexec patch for very large jobs

Maestas, Christopher Daniel cdmaest at sandia.gov
Fri Sep 17 11:26:07 EDT 2004


This is the patch I hacked to allow job launching on voltaire ibfiniband
fabrics:
Basically just set the version to 1 until voltaire updates their MVAPICH
release in their software package to support versions in the protocol.

Regards,

-----Original Message-----
From: Alex [mailto:korobka at nankai.edu.cn] 
Sent: Thursday, September 16, 2004 9:20 PM
To: cdmaest at sandia.gov; pw at osc.edu
Cc: mpiexec at osc.edu
Subject: RE: mpiexec patch for very large jobs


I have an update to this, it fixes a corner case. I will send it next week
after I get back to the office.

On the other hand, is there anything in the works to support OSU MVAPICH
stack? If not then I'll have a go at it next week.

Alex

ÔÚÄúµÄÀ´ÐÅÖÐÔø¾­Ìáµ½:
>From: "Maestas, Christopher Daniel" <cdmaest at sandia.gov>
>Reply-To:
>To: "'Pete Wyckoff'" <pw at osc.edu>,
Alex <korobka at nankai.edu.cn>
>Subject: RE: mpiexec patch for very large jobs
>Date:Thu, 16 Sep 2004 18:30:10 -0600
>
>Hello,
> 
> What is the current status of integrating this patch?
> 
> Regards,
> - Chris
> 
> 
>korobka at nankai.edu.cn wrote on Mon, 03 May 2004 19:53 +0800:
>> I encountered a problem where mpiexec would not work properly when
>> 
>> 1. The number of file descriptors exceeded FD_SETSIZE.
>> 2. write_full() in scatter_gm_startup_ports() returned -1 with errno
>>    of EAGAIN after a write to the connected nonblocking socket.
>> 
>> First problem could be fixed either by recompiling the kernel and
>> reinstalling it on all nodes or by replacing select() with poll() in 
>> the mpiexec source code, the second problem clearly needed better 
>> error handling in xxx_full() routines. Here is a patch for both 
>> problems. It worked here but it may need a bit more polishing.
>
>Thanks much for this patch.  I'll definitely include something like it 
>in the next release.  A few questions for you, though, if you'll help 
>me to understand some of it.
>
>Was it really necessary to grow the listen() backlog?  System defaults 
>tend to be around 128, so unless you had to change this systemwide 
>(e.g. via /proc/sys/net/core/somaxconn on linux), 4096 should give the 
>same behavior as 1024.  I can make that the default with a note about 
>the system limit if you think it makes sense.
>
>I need to make sure poll() exists on most machines then will gut any 
>remaining select() use.
>
>The second part of your patch is obviously the right thing to do.  
>Sorry I didn't deal with this correctly in the first place.  It doesn't 
>look necessary to check EAGAIN in read_full(), though, since we only 
>ever read blocking sockets.  And I'm tempted just to switch the fd to 
>blocking before the call to write_full(), maybe wrapped with an alarm() 
>to avoid the hang-on-dead-node scenario instead of the EAGAIN checking 
>code you did.
>
>Then I should do this to all the devices that need it, for 
>completeness, maybe abstracted out with some helper function for the 
>asynchronous
>connect() part.
>
>Thanks again,
>
>		-- Pete
>_______________________________________________
>mpiexec mailing list
>mpiexec at osc.edu
>http://email.osc.edu/mailman/listinfo/mpiexec
>



_______________________________________________
mpiexec mailing list
mpiexec at osc.edu
http://email.osc.edu/mailman/listinfo/mpiexec

-------------- next part --------------
A non-text attachment was scrubbed...
Name: voltaire_ib_patch_0.76
Type: application/octet-stream
Size: 363 bytes
Desc: not available
Url : http://email.osc.edu/pipermail/mpiexec/attachments/20040917/e10cac75/voltaire_ib_patch_0.obj


More information about the mpiexec mailing list