Minor Mpiexec p4 bug (and patch!)

Ben Webb ben at bellatrix.pcl.ox.ac.uk
Sat Jan 19 21:56:54 EST 2002


	I've just been playing with Mpiexec, as I've been looking for a 
way to get mpirun to play nicely with our OpenPBS system. (I came across 
Mpiexec some time ago, but without MPICH/P4 support, it would have been
super-tricky for us to use at the time.) Well, I have to say I'm very 
pleased with it; it's a very strong argument against the evil that is rsh.

Anyway:-

1. You include a patch against mpich-1.2.3-alpha. Unfortunately, 
   "1.2.3-alpha" appears to be a moving target; I downloaded it today and 
   found that most of your patch had to be applied manually. (About half 
   of it failed because the MPICH guys had already applied your fixes, 
   while most of the rest failed because of context changes.) You might
   consider making a snapshot of mpich available for download, if you 
   have the bandwidth, to save on this kind of agony. ;) Alternatively,
   you're welcome to the patch I made today (although I've messed up your 
   formatting in loads of places, as I use spaces rather than tabs).

2. I discovered that my jobs didn't work properly when started via. 
   mpiexec and the p4 device. Although the slave processes ran in the 
   correct directory, process 0 ran instead in the directory where my 
   executable was located, and since I don't often keep my input files in 
   /usr/bin, this caused my jobs to fail. A simple
   "mpiexec --comm=none pwd" test did not show this behaviour (it worked 
   as expected) so I guess it's a problem with MPICH. Anyway, the attached
   patch adds the p4 option "-p4wd" to the command line to set the working 
   directory properly for MPICH, and this seems to fix the problem. I'm 
   not, however, very familiar with the internals of MPICH, so I'm not 
   100% on this; if you have any better ideas I'd be most interested to 
   hear them.

	Ben
-- 
ben at bellatrix.pcl.ox.ac.uk           http://bellatrix.pcl.ox.ac.uk/~ben/
"640K ought to be enough for anybody."
	- Bill Gates, 1981
-------------- next part --------------
diff -Nur mpiexec-0.64/start_tasks.c mpiexec-0.64-patched/start_tasks.c
--- mpiexec-0.64/start_tasks.c	Wed Jan  9 20:19:15 2002
+++ mpiexec-0.64-patched/start_tasks.c	Sun Jan 20 01:58:10 2002
@@ -196,7 +196,8 @@
     char buf[PATH_MAX];
     int i, err, wait;
     char *nargv[3];
-    char pwd[PATH_MAX+11];  /* "cd " + "; exec \0" */
+    char pwd[PATH_MAX];
+    char cmd[PATH_MAX+11];  /* "cd " + "; exec \0" */
     char *cp;
     int conns[3];  /* expected connections to the stdio process */
     int master_port = 0;
@@ -204,11 +205,13 @@
     /*
      * get the pwd
      */
-    strlcpy(pwd, sizeof(pwd), "cd ");
-    if (!getcwd(pwd+3, sizeof(pwd)-3))
+    if (!getcwd(pwd, sizeof(pwd)))
 	error("start_tasks: no current working directory");
     pwd[sizeof(pwd)-1] = '\0';
-    strlcat(pwd, sizeof(pwd), "; exec ");
+
+    strlcpy(cmd, sizeof(cmd), "cd ");
+    strlcat(cmd, sizeof(cmd), pwd);
+    strlcat(cmd, sizeof(cmd), "; exec ");
 
     /*
      * Rewrite argv to go through user's shell, just like rsh.
@@ -340,7 +343,7 @@
 	env_terminate();
 
 	/* build proc-specific command line */
-	strlcpy(nargv[2], NARGV_LEN, pwd);  /* "cd <path>; exec " */
+	strlcpy(nargv[2], NARGV_LEN, cmd);  /* "cd <path>; exec " */
 	if (cl_args->tview) {
 	    if (i == 0) {
 		strlcat(nargv[2], NARGV_LEN, "totalview ");
@@ -354,6 +357,10 @@
 	    strlcat(nargv[2], NARGV_LEN, tasks[i].conf->exe);
 	}
 	if (cl_args->comm == COMM_MPICH_P4) {
+	    /* Pass the pwd to ch_p4 */
+	    strlcat(nargv[2], NARGV_LEN, " -p4wd ");
+	    strlcat(nargv[2], NARGV_LEN, pwd);
+
 	    /* the actual flag names are just for debugging; they're not used
 	     * but the order is important */
 	    strlcat(nargv[2], NARGV_LEN, " -execer_id mpiexec -master_host ");


More information about the mpiexec mailing list