mpiexec in 2 nodes

Pete Wyckoff pw at osc.edu
Tue Oct 9 10:23:08 EDT 2007


aurelia.marchand at obspm.fr wrote on Tue, 09 Oct 2007 15:02 +0200:
> I have a problem using mpiexec in more than one node.
> 
> when I have :
> #PBS -l nodes=1:ppn=2
> it work well
> 
> and when I have :
> 
> #PBS -l nodes=quadri3:ppn=1+quadri1:ppn=1
> mpiexec --comm=mpich2 /home/marchand/PBS/test/nomProc2.mpich
> 
> I have the error :
> mpiexec: resolve_exe: using absolute path 
> "/home/marchand/PBS/test/nomProc2.mpich".
> mpiexec: accept_pmi_conn: cmd=initack pmiid=0.
> mpiexec: accept_pmi_conn: rank 0 (spawn 0) checks in.
> mpiexec: accept_pmi_conn: cmd=init pmi_version=1 pmi_subversion=1.
> [unset]: connect failed with connection refused
> [unset]: Unable to connect to quadri3 on 39045
> [unset]: aborting job:

These [unset] messages are in the mpich2 library in the task on
quadri1.  During MPI_Init() it tries to connect to mpiexec on host
quadri3 port 39045, but gets a "connection refused" error.  But from
your nodes=1:ppn=2 test and that there is no error from the task on
quadri3 which connects to itself locally, we know mpiexec is
listening okay.

You might check if you have a firewall running and disable it.  The
other aspect to look at is name resolution:  maybe quadri1 has the
wrong IP address for quadri3 in its /etc/hosts.  Less likely.

		-- Pete


More information about the mpiexec mailing list