Mpiexec release 0.80: concurrency, fastdist, mpich/rai, ...

Pete Wyckoff pw at osc.edu
Fri Jul 15 14:10:21 EDT 2005


A fair amount of change happened since 0.78 was released four months
ago.  There was a stealth release 0.79 three months ago that added a
few features, now some major changes with 0.80.  But if you continue
to use mpiexec as you always have, everything should still work as
before.  The list of changes is somewhat long, though.

Concurrency feature

    If you want to run two independent parallel processes at the
    same time, you would be tempted to do something like:

	mpiexec -n 4 code1 &
	mpiexec -n 3 code2

    Only that has not worked until this release.  The problem is
    there is a limitation in TM interface used by PBS that only one
    TM client (aka mpiexec) can use it at a time.  Your alternative
    was to fall back to good-old rsh-based mpirun or to run two
    independent batch jobs and figure out how to get them to
    sychronize.

    Now the first mpiexec to run in a PBS job will listen on a named
    pipe in /tmp, and later mpiexec processes will connect to it
    for all TM activity.  All the subsequent mpiexec processes can
    start and stop in any order with no hardcoded limit on the
    number or sizes of tasks.  You might also do:

	mpiexec -server &

    to start one "master" mpiexec for all the others if having one
    "empty" mpiexec process best fits your computation model, as
    perhaps in a branch-and-bound style optimization.

    The first (or master) mpiexec still enforces non-overlapping
    processor allocation, so the sum of all the running "-n <numproc>"
    may not be larger than the overall PBS allocation.  All other
    command line processing is handled by the individual client
    mpiexec processes, as is all stdio for their respective parallel
    programs.

    The addition of this feature was made possible through
    contributions by the DAKOTA Code Group of Sandia National
    Laboratories.

Fast executable distribution

    New experimental executable distribution code was added.  If you
    have an Infiniband network, you can choose to download and use
    code written by Dennis Dalessandro  found at:
	http://www.osc.edu/~dennis/fastdist
    to push your executable to all the compute nodes very rapidly
    instead of relying on NFS.  Mpiexec can be configured to call
    this code before execution.

Mpich/rai support

    This adds support for the Rapid Array Interconnect version of
    MPICH used by Cray on their XD1 machines.  These are Opteron
    clusters with custom message passing code on an Infiniband
    physical-layer transport.  The MPICH device comes from the
    MVIA heritage and thus looks a lot like the old-style MPICH/IB
    startup code.

Config file task ordering

    Now entries in a --config file are spawned in order, that is,
    the first line becomes task 0, the second becomes task 1, etc.
    Previously the MPI task order was always fixed by what was
    handed out by PBS.  Now you can order them at will using by
    using a --config file.

Another message passing hostname transformation option

    New option -transform-hostname-program generalizes the existing
    option -transform-hostname by allowing specification of any
    external program to change the canonical hostnames to the
    hostnames used for the message passing fabric.  Contributed by
    Dries Kimpe.

Signal handling

    More attention is paid to handling signals.  In particular, if
    you hit ctrl-C, mpiexec will try to kill off all tasks and the
    stdio handler.  If you are impatient and hit it again, it will
    just exit immediately.  Previously there were situations in
    which tasks could linger or (worse?) in which mpiexec would
    refuse to exit no matter how hard one hammered the keyboard.

Internal data structures

    Pretty much all of the code that handles nodes, tasks, and
    events was rewritten to support the new concurrent feature.  It
    is quite a bit prettier (says the author) and should make future
    maintenance easier.

Little bug fixes

    Don't die when stdio exits early in mpich2/pmi.

    Pass the entire environment even when it is large (as in lots of
    environment variables, or big ones), contributed by Belmont
    Cheung.

    Use va_list correctly, contibuted by Kai Germaschewski.

    Support Topsin-specific changes to MPICH/IB startup code.

    Support original mpich2 version naming string used by Intel
    mpich2, contributed by Anton Starikov.


Full changelog and downloads at:  http://www.osc.edu/~pw/mpiexec/
Send bug reports, comments and suggestions to the mailing list.

		-- Pete



More information about the mpiexec mailing list