mpiexec scalability improved!
Pete Wyckoff
pw at osc.edu
Tue Apr 18 14:48:52 EDT 2006
garrick at usc.edu wrote on Wed, 12 Apr 2006 19:42 -0700:
> On Wed, Apr 12, 2006 at 05:38:45PM -0700, garrick alleged:
> > On Wed, Apr 12, 2006 at 03:24:44PM -0700, garrick alleged:
[..]
> > This is definitely failing regularly. With -v -v, about all I get is
> > this:
> > mpiexec: Error: do_child: input on unexpected fd 10.
>
> fd 10 is the pipe from the parent and the handler isn't clearing it
> from the read fd list.
>
> One line patch to do_child() and it is behaving very very nicely:
> /* message from parent */
> if (poll_isset(pipe_with_stdio, &rfsd)) {
> poll_clr(pipe_with_stdio, &rfsd); <-- missing
> --n;
> stdio_msg_listener_read();
> }
>
> I've run dozens of tests on 1800 nodes without any failures.
>
> Will we see this in a release or svn soon?
You are absolutely right---surprised anything worked without that
line. Thanks for fixing it and testing. I'm running tests on a
bunch of machines and will likely check in these GM changes soon.
Hopefully roll a release here too.
-- Pete
More information about the mpiexec
mailing list