Minor Mpiexec p4 bug (and patch!)

Pete Wyckoff pw at osc.edu
Mon Jan 21 12:05:19 EST 2002


ben at bellatrix.pcl.ox.ac.uk said:
> 1. You include a patch against mpich-1.2.3-alpha. Unfortunately, 
>    "1.2.3-alpha" appears to be a moving target; I downloaded it today and 
>    found that most of your patch had to be applied manually. (About half 
>    of it failed because the MPICH guys had already applied your fixes, 
>    while most of the rest failed because of context changes.) You might
>    consider making a snapshot of mpich available for download, if you 
>    have the bandwidth, to save on this kind of agony. ;) Alternatively,
>    you're welcome to the patch I made today (although I've messed up your 
>    formatting in loads of places, as I use spaces rather than tabs).

Ouch.  I did not realize they would do this, but alas.  Fortunately they
did integrate everything, more or less as I had done it.  I'll put some
warnings saying you have to pay attention to the release date.  It looks
like at least they have a file include/patchlevel.h against which we
will be able to track the version.  I'm running tests against a stock
mpich-1.2.3-alpha-020118 and they seem to be working, except not
defining USE_NONBLOCKING_LISTENER_SOCKETS means some jobs hang as
they're ending.  Maybe just a teensy patch then.

> 2. I discovered that my jobs didn't work properly when started via. 
>    mpiexec and the p4 device. Although the slave processes ran in the 
>    correct directory, process 0 ran instead in the directory where my 
>    executable was located, and since I don't often keep my input files in 
>    /usr/bin, this caused my jobs to fail. A simple
>    "mpiexec --comm=none pwd" test did not show this behaviour (it worked 
>    as expected) so I guess it's a problem with MPICH. Anyway, the attached
>    patch adds the p4 option "-p4wd" to the command line to set the working 
>    directory properly for MPICH, and this seems to fix the problem. I'm 
>    not, however, very familiar with the internals of MPICH, so I'm not 
>    100% on this; if you have any better ideas I'd be most interested to 
>    hear them.

I'd not noticed this.  Weird.  Looking at the mpich code one can see
that if -p4wd is not specified, but if the executable name has a "/" in
it, then process 0 runs to that directory.  I guess if you had not
fully-qualified the name of the code it might have worked, but it is
silly that that behavior is different.  It looks like the other
processes should have done the same thing as well.  Rather than fight
MPICH to change this, I'll apply your patch to mpiexec and issue a new
version real soon now.  Thanks!

		-- Pete

P.S.  Is it okay if I bounce both these messages to the mpiexec list for
archival purposes?



More information about the mpiexec mailing list