mpiexec in 2 nodes

Marchand Aurélia aurelia.marchand at obspm.fr
Wed Oct 10 09:53:49 EDT 2007


Thank you for your reply.
The problem is a private network :
more /etc/hosts.local
# hosts.local
# Définitions propres à cette machine
145.238.2.10            quadri1.obspm.fr quadri1
# partage NFS sur réseau privé
192.168.0.1             siolino-s siolino
192.168.0.2             quadri1-s
192.168.0.3             quadri2-s quadri2
192.168.0.4             quadri3-s quadri3
192.168.0.5             quadri4-s quadri4
192.168.0.6             quadri5-s quadri5
192.168.0.7             quadri6-s quadri6

I think he use quadri3-s and not quadri3.
When I use mpi, I have to add .obspm.fr to the node name in machinefile
For the other machine quadri[7-9] I haven't got problem

Aurelia

Pete Wyckoff wrote:

>aurelia.marchand at obspm.fr wrote on Tue, 09 Oct 2007 15:02 +0200:
>  
>
>>I have a problem using mpiexec in more than one node.
>>
>>when I have :
>>#PBS -l nodes=1:ppn=2
>>it work well
>>
>>and when I have :
>>
>>#PBS -l nodes=quadri3:ppn=1+quadri1:ppn=1
>>mpiexec --comm=mpich2 /home/marchand/PBS/test/nomProc2.mpich
>>
>>I have the error :
>>mpiexec: resolve_exe: using absolute path 
>>"/home/marchand/PBS/test/nomProc2.mpich".
>>mpiexec: accept_pmi_conn: cmd=initack pmiid=0.
>>mpiexec: accept_pmi_conn: rank 0 (spawn 0) checks in.
>>mpiexec: accept_pmi_conn: cmd=init pmi_version=1 pmi_subversion=1.
>>[unset]: connect failed with connection refused
>>[unset]: Unable to connect to quadri3 on 39045
>>[unset]: aborting job:
>>    
>>
>
>These [unset] messages are in the mpich2 library in the task on
>quadri1.  During MPI_Init() it tries to connect to mpiexec on host
>quadri3 port 39045, but gets a "connection refused" error.  But from
>your nodes=1:ppn=2 test and that there is no error from the task on
>quadri3 which connects to itself locally, we know mpiexec is
>listening okay.
>
>You might check if you have a firewall running and disable it.  The
>other aspect to look at is name resolution:  maybe quadri1 has the
>wrong IP address for quadri3 in its /etc/hosts.  Less likely.
>
>		-- Pete
>  
>

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aurélia Marchand
Service Informatique de l'Observatoire
5 place Jules Janssen                      Tel : 01 45 07 76 24
92195 Meudon                               Fax : 01 45 07 76 13
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




More information about the mpiexec mailing list