mpiexec scalability improved!
garrick
garrick at usc.edu
Wed Apr 12 22:42:54 EDT 2006
On Wed, Apr 12, 2006 at 05:38:45PM -0700, garrick alleged:
> On Wed, Apr 12, 2006 at 03:24:44PM -0700, garrick alleged:
> > On Wed, Apr 12, 2006 at 12:49:53PM -0400, Pete Wyckoff alleged:
> > > pw at osc.edu wrote on Tue, 11 Apr 2006 10:29 -0400:
> > > > Hang onto your patch. I'll take a crack at converting gm.c to
> > > > do periodic servicing without fork and you can see how you like
> > > > that.
> > >
> > > Are you willing to test my vision for GM async? Here's a patch.
> > > It works here on 4 GM nodes on ia64, and the debug statements
> > > appear to show it's doing the right things, but you may run
> > > into issues at scale. I am curious to know if it is as fast
> > > as your fork() version or the mpich-gm perl script.
> >
> > It does the job in 10-20 seconds, but still failed once in about 10
> > runs.
>
> This is definitely failing regularly. With -v -v, about all I get is
> this:
> mpiexec: Error: do_child: input on unexpected fd 10.
fd 10 is the pipe from the parent and the handler isn't clearing it
from the read fd list.
One line patch to do_child() and it is behaving very very nicely:
/* message from parent */
if (poll_isset(pipe_with_stdio, &rfsd)) {
poll_clr(pipe_with_stdio, &rfsd); <-- missing
--n;
stdio_msg_listener_read();
}
I've run dozens of tests on 1800 nodes without any failures.
Will we see this in a release or svn soon?
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://email.osc.edu/pipermail/mpiexec/attachments/20060412/27ba36ec/attachment-0001.bin
More information about the mpiexec
mailing list