Minor Mpiexec p4 bug (and patch!)
Ben Webb
ben at bellatrix.pcl.ox.ac.uk
Sat Jan 19 21:56:54 EST 2002
I've just been playing with Mpiexec, as I've been looking for a
way to get mpirun to play nicely with our OpenPBS system. (I came across
Mpiexec some time ago, but without MPICH/P4 support, it would have been
super-tricky for us to use at the time.) Well, I have to say I'm very
pleased with it; it's a very strong argument against the evil that is rsh.
Anyway:-
1. You include a patch against mpich-1.2.3-alpha. Unfortunately,
"1.2.3-alpha" appears to be a moving target; I downloaded it today and
found that most of your patch had to be applied manually. (About half
of it failed because the MPICH guys had already applied your fixes,
while most of the rest failed because of context changes.) You might
consider making a snapshot of mpich available for download, if you
have the bandwidth, to save on this kind of agony. ;) Alternatively,
you're welcome to the patch I made today (although I've messed up your
formatting in loads of places, as I use spaces rather than tabs).
2. I discovered that my jobs didn't work properly when started via.
mpiexec and the p4 device. Although the slave processes ran in the
correct directory, process 0 ran instead in the directory where my
executable was located, and since I don't often keep my input files in
/usr/bin, this caused my jobs to fail. A simple
"mpiexec --comm=none pwd" test did not show this behaviour (it worked
as expected) so I guess it's a problem with MPICH. Anyway, the attached
patch adds the p4 option "-p4wd" to the command line to set the working
directory properly for MPICH, and this seems to fix the problem. I'm
not, however, very familiar with the internals of MPICH, so I'm not
100% on this; if you have any better ideas I'd be most interested to
hear them.
Ben
--
ben at bellatrix.pcl.ox.ac.uk http://bellatrix.pcl.ox.ac.uk/~ben/
"640K ought to be enough for anybody."
- Bill Gates, 1981
-------------- next part --------------
diff -Nur mpiexec-0.64/start_tasks.c mpiexec-0.64-patched/start_tasks.c
--- mpiexec-0.64/start_tasks.c Wed Jan 9 20:19:15 2002
+++ mpiexec-0.64-patched/start_tasks.c Sun Jan 20 01:58:10 2002
@@ -196,7 +196,8 @@
char buf[PATH_MAX];
int i, err, wait;
char *nargv[3];
- char pwd[PATH_MAX+11]; /* "cd " + "; exec \0" */
+ char pwd[PATH_MAX];
+ char cmd[PATH_MAX+11]; /* "cd " + "; exec \0" */
char *cp;
int conns[3]; /* expected connections to the stdio process */
int master_port = 0;
@@ -204,11 +205,13 @@
/*
* get the pwd
*/
- strlcpy(pwd, sizeof(pwd), "cd ");
- if (!getcwd(pwd+3, sizeof(pwd)-3))
+ if (!getcwd(pwd, sizeof(pwd)))
error("start_tasks: no current working directory");
pwd[sizeof(pwd)-1] = '\0';
- strlcat(pwd, sizeof(pwd), "; exec ");
+
+ strlcpy(cmd, sizeof(cmd), "cd ");
+ strlcat(cmd, sizeof(cmd), pwd);
+ strlcat(cmd, sizeof(cmd), "; exec ");
/*
* Rewrite argv to go through user's shell, just like rsh.
@@ -340,7 +343,7 @@
env_terminate();
/* build proc-specific command line */
- strlcpy(nargv[2], NARGV_LEN, pwd); /* "cd <path>; exec " */
+ strlcpy(nargv[2], NARGV_LEN, cmd); /* "cd <path>; exec " */
if (cl_args->tview) {
if (i == 0) {
strlcat(nargv[2], NARGV_LEN, "totalview ");
@@ -354,6 +357,10 @@
strlcat(nargv[2], NARGV_LEN, tasks[i].conf->exe);
}
if (cl_args->comm == COMM_MPICH_P4) {
+ /* Pass the pwd to ch_p4 */
+ strlcat(nargv[2], NARGV_LEN, " -p4wd ");
+ strlcat(nargv[2], NARGV_LEN, pwd);
+
/* the actual flag names are just for debugging; they're not used
* but the order is important */
strlcat(nargv[2], NARGV_LEN, " -execer_id mpiexec -master_host ");
More information about the mpiexec
mailing list