mpiexec in 2 nodes
Pete Wyckoff
pw at osc.edu
Tue Oct 9 10:23:08 EDT 2007
aurelia.marchand at obspm.fr wrote on Tue, 09 Oct 2007 15:02 +0200:
> I have a problem using mpiexec in more than one node.
>
> when I have :
> #PBS -l nodes=1:ppn=2
> it work well
>
> and when I have :
>
> #PBS -l nodes=quadri3:ppn=1+quadri1:ppn=1
> mpiexec --comm=mpich2 /home/marchand/PBS/test/nomProc2.mpich
>
> I have the error :
> mpiexec: resolve_exe: using absolute path
> "/home/marchand/PBS/test/nomProc2.mpich".
> mpiexec: accept_pmi_conn: cmd=initack pmiid=0.
> mpiexec: accept_pmi_conn: rank 0 (spawn 0) checks in.
> mpiexec: accept_pmi_conn: cmd=init pmi_version=1 pmi_subversion=1.
> [unset]: connect failed with connection refused
> [unset]: Unable to connect to quadri3 on 39045
> [unset]: aborting job:
These [unset] messages are in the mpich2 library in the task on
quadri1. During MPI_Init() it tries to connect to mpiexec on host
quadri3 port 39045, but gets a "connection refused" error. But from
your nodes=1:ppn=2 test and that there is no error from the task on
quadri3 which connects to itself locally, we know mpiexec is
listening okay.
You might check if you have a firewall running and disable it. The
other aspect to look at is name resolution: maybe quadri1 has the
wrong IP address for quadri3 in its /etc/hosts. Less likely.
-- Pete
More information about the mpiexec
mailing list