Intel-MPI 3.0beta sends cmd=get_ranks2hosts

Thomas Zeiser thomas.zeiser at rrze.uni-erlangen.de
Mon Nov 13 12:33:39 EST 2006


Dear All,

Intel released - as promised - some very short documentation of the
additional PMI command they introduced with their latest Intel MPI
3.0 (or perhaps already some 2.x version).

Please find attached a first patch for pmi.c which adds the
required functionality to mpiexec. (It relative to mpiexec-0.81)

Thanks to Pete for pointing me to the correct part of the code and
providing already a wrapper. As I did not dig too much into the
details of mpiexec, there might be nicer way to implement but at
least it worked in my initial tests (and should even take care of
hostname transformations) ...


Regards,

thomas
 
On Thu, Sep 21, 2006 at 11:22:54AM +0200, Thomas Zeiser wrote:
> Dear All,
> 
> it seems that Intel implemented some (new?) extensions of the PMI
> protocol in their latest version of Intel MPI (3.0beta).
> 
> According to Intel's release notes, pmi_version=1 and
> pmi_subversion=1 should be used (unchanged since version 2.0.1).
> 
> When I try to start an Intel MPI 3.0beta executable with mpiexec
> (either 0.80 or 0.81) I get the following message:
> 
>   % mpiexec -comm pmi -n 4 ./a.out
>   mpiexec: Error: handle_pmi: unknown cmd get_ranks2hosts.
>   mpiexec: Warning: tasks 0-3 exited with status 174.
> 
> 
> "cmd=get_ranks2hosts" comes from
> /opt/intel/ict/3.0b/mpi/3.0b/lib64/libmpi.so.3.1 and I did not see
> any match in pmi.c.
> 
> 
> Any ideas how to handle/fix this issue?
> 
> 
> Kind regards,
> 
> Thomas Zeiser
> 
> -- 
> Dipl.-Ing. Thomas ZEISER
> Regionales Rechenzentrum Erlangen
> Martensstr. 1, 91058 Erlangen, GERMANY
-------------- next part --------------
--- pmi.c.orig	2006-04-20 05:53:26.000000000 +0200
+++ pmi.c	2006-11-11 20:38:27.000000000 +0100
@@ -554,6 +554,86 @@
 	    error_errno("%s: response %s", __func__, g->s);
 	growstr_free(g);
 
+    } else if (!strcmp(kv->val[0], "get_ranks2hosts")) {
+       /*
+        * PMI_Get_ranks2hosts: PMI-API extension in Intel MPI 3.0
+        */
+
+       /* request: cmd=get_ranks2hosts */
+       if (kv->num != 1)
+           error("%s: in cmd=%s, expecting 1 keyval, got %d",
+                  __func__, kv->val[0], kv->num);
+       debug(2, "%s: cmd=%s", __func__, kv->val[0]);
+
+       /* response: cmd=put_ranks2hosts MSGLEN NUM_OF_HOSTS\n
+	        HNLEN HOSTNAME RANK1,...,RANKN, HNLEN HOSTNAME RANK1,...,RANKN,\n
+
+         MSGLEN: number of characters in the next message+1
+         NUM_OF_HOSTS: total number of non-recurring host names
+	  HNLEN: number of characters in the next hostname field
+	  HOSTNAME: node name
+	  RANK1,...,RANKN,: comma separated list of ranks executed on
+	                    the node; if the list is the last in the
+			    response message it must be followed by a space
+       */
+
+       int i, j, *r2h_tasks, r2h_hosts;
+       growstr_t *msg;
+
+       /* allocate an integer array and duplicate the tasks[].node values 
+	  whenever a new hostname is found, all tasks running on that node
+	  will be processed and marked accordingly with -1 */
+       r2h_tasks = Malloc(sizeof(int)*numtasks);
+       for (i=0; i<numtasks; i++)
+	 r2h_tasks[i] = tasks[i].node;
+
+       /* number of uniq hosts found */
+       r2h_hosts = 0;
+
+       /* construct the second part of the response first as we need its 
+	  length later on ... */
+       msg =  growstr_init();
+       for (i=0; i<numtasks; i++)
+	 {
+	   if ( r2h_tasks[i] != -1 )
+	     {
+	       r2h_hosts++;
+	       /* add the length of the mpi-hostname, the name itself and
+		  the current tasknumber */
+	       growstr_printf(msg, "%d %s %d,",
+			      (int)strlen(nodes[r2h_tasks[i]].mpname),
+			      nodes[r2h_tasks[i]].mpname, i);
+	       /* check if any other not yet processed task is running
+		  on the same node */
+	       for (j=i+1; j<numtasks; j++)
+		 {
+		   if ( r2h_tasks[j] == r2h_tasks[i] )
+		     {
+		       growstr_printf(msg, "%d,", j); /* append task */
+		       r2h_tasks[j] = -1; /* mark as processed */
+		     }
+		 }
+	       growstr_printf(msg, " "); /* add space as separator */
+	     }
+	 }
+       growstr_printf(msg, "\n"); /* add new line as EOF marker */
+       free(r2h_tasks);
+
+ 
+       /* construct 1st part of response */
+       g = growstr_init();
+       growstr_printf(g, "put_ranks2hosts %d %d\n", msg->len+1, r2h_hosts);
+
+       /* append 2nd part to 1st part */
+       growstr_printf(g, "%s", msg->s);
+
+       debug(2, "%s: ranks2hosts reply: %s", __func__, g->s);
+
+       if (write_full(pmi_fds[rank], g->s, g->len) < 0)
+           error_errno("%s: response %s", __func__, g->s);
+       growstr_free(g);
+       growstr_free(msg);
+
     } else if (!strcmp(kv->val[0], "get_universe_size")) {
 	/*
 	 * PMI_Get_universe_size used by sock but not shm.  Says how


More information about the mpiexec mailing list