mpiexec patch for InfiniPath MPI

Pete Wyckoff pw at osc.edu
Sat Mar 15 15:21:56 EDT 2008


christian.bell at qlogic.com wrote on Tue, 11 Mar 2008 08:47 -0700:
> Seems like forever, but here's the updated patch.  The customer who
> really desired mpiexec support has not given me any input on whether
> this newer mpiexec works well for him.  After 6 months, I just
> decided to install Torque on 17 nodes and tested the implementation.
> 
> This patch applies to the latest svn revision so you should be ready
> for a new release (hint hint!).  We're trying to release our 2.2
> software soon and it would be nice if we could reference an mpiexec
> release number but that's only desirable -- I can call it 0.84 as
> soon as you accept the patch.
> 
> I've addressed most of the points mentioned in your e-mail and
> rewrote parts of the nastier pointer handling -- it's better but not
> perfect.  I didn't want to infect such simple code with pragma
> packed structs and such.

Looks good.  Applied.  Some comments that may suggest minor fixup
patches later, or not.

This bit in stdio.c:

+	if (message_len > 8) { /* extra error message attached */
+	    err_extra = Malloc(message_len); /* 8 extra chars we need */
+	    snprintf(err_extra, message_len-1, " (%s) ", errdata+8);
+	    err_extra[message_len-1] = '\0';
+	}
+
+	warning("%s: %s%sfrom rank %d. Killing all",
+		__func__, err_reason, err_extra ? err_extra : " ",
+		mpi_rank);

You already read the whole message.  Why alloc again and memcpy
this?  Can't you just printf the string directly later?  Something
like

	if (message_len > 8)
	    warning("%s: %s (%s) from rank %d. Killing all",
		    __func__, err_reason, errdata + 8, mpi_rank);
	else
	    warning("%s: %s from rank %d. Killing all",
		    __func__, err_reason, mpi_rank);

In psm.c:

+    u64ptr = (uint64_t *) (data + 3 * sizeof (uint32_t));
+    epids[rank] = ntohu64 (*u64ptr);

This u64ptr will be unaligned on 64-bit architectures.  While you
can get away with it on opteron, it will cause complaints on ia64
and errors on mips.  Suggest you do memcpy into a u64 then ntohu64
on it.

Same deal here.  This is at offset 20 in a malloced block, so also
unaligned.  I'm not particularly worked up about it, but you may
run into trouble eventually.

+    einfo_epids = (uint64_t *) (einfo_hdr + 4);

As far as release, I'd like to hold off until we get this mvapich
situation figured out.  Hopefully not too long.  Then there will be
two bright features to talk about.  If you're desperate, I will
silently bump to 0.84 and you can ship that, but I'd rather not have
two different 0.84s out there, so let me know.

		-- Pete


More information about the mpiexec mailing list