[mvapich-discuss] mvapich & mpiexec

Jan Ploski Jan.Ploski at offis.de
Sun May 27 16:25:21 EDT 2007


 > > If mvapich work witch mpiexec? I use  MVAPICH 0.9.9-beta and
 > > mpiexec 0.82.
 > > When I try run tasks I see messages:
 > >      mpiexec: Warning: read_ib_one: protocol version 5 not
 > >      known, but might still work.
 > > But nothig to do. If last mvapich version is not supported by
 > > mpiexec. Or are there other methods to run mpi tasks with
 > > batch system? I use the latest torque batch system. Batch scripts
 > > with mpirun work, but for that is needed install distributive of
 > > mvapich on all nodes.
 >
 > Currently there is no way to enable the old startup protocol in 0.9.9.
 >
 > mpiexec will need to be updated to accommodate the new
 > startup protocol that is used in 0.9.9.

I'd like to add that the same problem exists with mpiexec 0.82 combined 
with mvapich-0.9.7-mlx2.2.0 from the OFED-1.1 distribution. It also uses 
protocol version 5.

I made a quick attempt to fix it today (by comparing mpiexec code with 
that of mpirun_rsh.c), but I failed... It seems after all that the 
protocol has changed in a non-trivial way, e.g. sockets being closed and 
reopened between sending hostids and other information to the IB processes.

Do you know is there any specification of the connect protocol or is 
mpirun_rsh's source code all that is available? Are there any plans by 
the original developer(s) to fix this incompatibility any soon?

Best regards,
Jan Ploski


More information about the mpiexec mailing list