[mvapich-discuss] mvapich & mpiexec
Jan Ploski
Jan.Ploski at offis.de
Sun May 27 16:25:21 EDT 2007
> > If mvapich work witch mpiexec? I use MVAPICH 0.9.9-beta and
> > mpiexec 0.82.
> > When I try run tasks I see messages:
> > mpiexec: Warning: read_ib_one: protocol version 5 not
> > known, but might still work.
> > But nothig to do. If last mvapich version is not supported by
> > mpiexec. Or are there other methods to run mpi tasks with
> > batch system? I use the latest torque batch system. Batch scripts
> > with mpirun work, but for that is needed install distributive of
> > mvapich on all nodes.
>
> Currently there is no way to enable the old startup protocol in 0.9.9.
>
> mpiexec will need to be updated to accommodate the new
> startup protocol that is used in 0.9.9.
I'd like to add that the same problem exists with mpiexec 0.82 combined
with mvapich-0.9.7-mlx2.2.0 from the OFED-1.1 distribution. It also uses
protocol version 5.
I made a quick attempt to fix it today (by comparing mpiexec code with
that of mpirun_rsh.c), but I failed... It seems after all that the
protocol has changed in a non-trivial way, e.g. sockets being closed and
reopened between sending hostids and other information to the IB processes.
Do you know is there any specification of the connect protocol or is
mpirun_rsh's source code all that is available? Are there any plans by
the original developer(s) to fix this incompatibility any soon?
Best regards,
Jan Ploski
More information about the mpiexec
mailing list