Intel-MPI 3.0beta sends cmd=get_ranks2hosts
Pete Wyckoff
pw at osc.edu
Thu Sep 21 10:37:35 EDT 2006
thomas.zeiser at rrze.uni-erlangen.de wrote on Thu, 21 Sep 2006 11:22 +0200:
> it seems that Intel implemented some (new?) extensions of the PMI
> protocol in their latest version of Intel MPI (3.0beta).
>
> According to Intel's release notes, pmi_version=1 and
> pmi_subversion=1 should be used (unchanged since version 2.0.1).
Would have been nice of them to bump at least one of these to
indicate that they changed the PMI command set. There's no official
PMI specification so we can't complain they're not following it, but
at least using the version numbers seems an obvious thing they could
do.
> When I try to start an Intel MPI 3.0beta executable with mpiexec
> (either 0.80 or 0.81) I get the following message:
>
> % mpiexec -comm pmi -n 4 ./a.out
> mpiexec: Error: handle_pmi: unknown cmd get_ranks2hosts.
> mpiexec: Warning: tasks 0-3 exited with status 174.
>
> "cmd=get_ranks2hosts" comes from
> /opt/intel/ict/3.0b/mpi/3.0b/lib64/libmpi.so.3.1 and I did not see
> any match in pmi.c.
Nor in the latest mpich2-1.0.4p1 from ANL, on which the Intel MPI is
based, but maybe only in ancient history.
We can see if it matters, though, if you're willing to debug a
bit. Here's some tasks and questions.
1. Complain to Intel that their MPI beta breaks external tools
that use PMI. Should bump version numbers. Ask for documentation
on their PMI additions beyond Argonne mpich2.
2. Is there source for this Intel MPI somewhere? We can just read
it and see what ranks2hosts does. (Also where can one get binaries
if I want to do step (5) below myself?)
3. Can you run mpiexec with "-v -v" to see when in the startup
sequence it appears? Hopefully it is after "cmd=get_maxes".
4. Apply this patch and run with "-v -v" and we'll see what
the arguments are to get_ranks2hosts. We may also find out that
it is optional and the application continues even with a bogus
response.
5. Use Intel's MPD to start an application, while using tcpdump
to catch the conversation. In another shell on the machine that
will run one of the tasks of your app, but different from the
machine that runs your PBS job script, get all the text with:
tcpdump -nlv -X -s 1500 -i eth0 > packets.txt
and send me the relevant bits (or whole thing). You may have
to tweak the "-i eth0" or add qualifiers like "not port nfs"
to keep the size sane.
Not sure if you were looking for this level of response to your
query. :)
-- Pete
More information about the mpiexec
mailing list