mpiexec error on OS X

Pete Wyckoff pw at osc.edu
Thu Apr 28 13:27:32 EDT 2005


beaneg at umcs.maine.edu wrote on Thu, 28 Apr 2005 12:38 -0400:
> I get this error occasionally (and seemingly randomly) on OS X:
> 
> mpiexec: Error: read_gm_startup_ports: read gmpi_port#1 iter 56: 
> Resource temporarily unavailable.

That's EAGAIN according to the mac sys/errno.h.  This is supposed
only to happen on non-blocking sockets, but my intention is that that
fd would be blocking.

> Pete, can you give me a brief description on what mpiexec is trying to 
> do at this point, so I can provide more detailed information to 
> myricom?

For each of the processes, mpiexec waits in accept() for a connection
from a process.  Then it reads this new fd, one character at a time
using read(), until it gets the full "<<<....>>>" string from the GM
startup code.

Reading the mac manpage for accept() I see it says it "creates a new
socket with the same properties of s".  Contrast this to the linux man
page that says explicitly "Note that any per file descriptor flags
(everything that can be set with the F_SETFL fcntl, like non blocking or
async state) are not inherited across an accept."

I don't know which is the correct behavior.  But you might change the
code in gm.c around line 139 to look something like this:

    }
    /* explictly turn off nonblocking for mac accept behavior */
    flags = fcntl(fd, F_GETFL);
    if (flags < 0)
	error_errno("%s: get new socket flags", __func__);
    if (fcntl(fd, F_SETFL, flags & ~O_NONBLOCK) < 0)
	error_errno("%s: set new socket blocking", __func__);
    cc = read_until(fd, s, sizeof(s), ">>>", 0);
    if (cc < 0)
	error_errno("%s: read gmpi_port#1 iter %d", __func__, i);

and test a bit to see if the problem goes away.  Then we'll know for
sure.

I'm curious what standards have to say about this issue:  should sockets
generated by accept() inherit the non-blocking nature of the listening
socket?

		-- Pete



More information about the mpiexec mailing list