Odd behavior with mpiexec for voltaire infiiband package

Pete Wyckoff pw at osc.edu
Wed Feb 4 13:25:01 EST 2004


cdmaest at sandia.gov said on Wed, 04 Feb 2004 10:45 -0700:
> The infiniband environment distributed with Voltaire's software
> integration package (ibhost-hpc-2.0.0_10-1rh90.k) errors out when run
> here:
> ---
> [cdmaest at ca894 mpi]$ /projects/mpiexec/bin/mpiexec -np 2 -pernode
> -comm=ib cpi_infiniband
> mpiexec: Warning: read_ib_startup_ports: protocol version 0 not known,
> but might still work.
> mpiexec: Error: read_ib_startup_ports: rank 48 out of bounds [0..2).
> read: Connection reset by peer
> ---
> 
> If you comment out the first read_full in start_tasks.c, then things
> work for the voltaire stuff.

Yes, the version number is something I'm pushing into the OSU MVAPICH
code, but it has not yet made it into Voltaire's release.  Your fix of
skipping the version check in mpiexec is just fine (but you may want to
set the variable to 1 or not check it).

The problem with _not_ having a version number is that in the future it
will become difficult for mpiexec to figure out how to talk to the
application if code changes require startup modifications.  We suffered
through this with MPICH/GM and I'd rather not have to deal with that
again.  I'm hoping instead that some short-term pain (sorry) will be
bearable before IB becomes wildly popular.  Perhaps the next MVAPICH and
Voltaire releases will have incorporated versioning.

Please grab the mpiexec from CVS if you plan to use it with IB.  There
are a couple of fixes there, one a performance improvement for startup
and the other allows --with-default-comm=ib to work.  Some other
enhancements there aren't too important, but won't hurt.

		-- Pete



More information about the mpiexec mailing list