mpiexec GMPI_SLAVE env.t problem

Bryan Hellyer brh at unimelb.edu.au
Tue Aug 5 22:02:21 EDT 2003


Hi,

I'm a newcomer to mpiexec, and have been trying install on an IBM e-1350 
Intel cluster
running RH7.3,

I ran configure with
./configure \
   --with-pbs=/usr/pbs \
   --with-pbssrc=/usr/local/src/OpenPBS_2_3_16 \
   --prefix=/usr/local/src/mpiexec_0.74/mpiexec-0.74 \
   --with-default-comm=mpich-gm

and we have mpich-gm 1.2.5..10 in /usr/local/src/mpich-gm/mpich-1.2.5..10

but tests fail with :
<MPICH-GM> Error: Need to obtain the slave's hostname in GMPI_SLAVE !
[0] Error: write to socket failed !

I've tracked this down to the mpich-gm gmpi_conf.c source and gmpi_getenv 
routine, and
have added printf's to see what's happening...

Under mpirun, the GMPI_.... env't vbles get returned OK, eg.
cat hello_mpigm_ppn.e209
BH: gmpi_conf.c : gethostbyname returned node004
BH: gmpi_getenv var : GMPI_MAGIC , result 7374385
BH: gmpi_getenv var : GMPI_MASTER , result node040
BH: gmpi_getenv var : GMPI_PORT , result 8000
BH: gmpi_getenv var : GMPI_SLAVE , result 172.20.3.4
BH: gmpi_getenv var : GMPI_ID , result 7
BH: gmpi_getenv var : GMPI_NP , result 8
BH: gmpi_getenv var : GMPI_BOARD , result -1
BH: gmpi_getenv var : GMPI_NUMA_NODE , result (null)
BH: gmpi_getenv var : GMPI_EAGER , result (null)
BH: gmpi_getenv var : GMPI_SHMEM , result 1
BH: gmpi_getenv var : GMPI_RECV , result (null)
BH: gmpi_conf.c : gethostbyname returned node004
BH: gmpi_getenv var : GMPI_MAGIC , result 7374385
BH: gmpi_getenv var : GMPI_MASTER , result node040
BH: gmpi_getenv var : GMPI_PORT , result 8000
BH: gmpi_getenv var : GMPI_SLAVE , result 172.20.3.4
BH: gmpi_getenv var : GMPI_ID , result 6
BH: gmpi_getenv var : GMPI_NP , result 8
BH: gmpi_getenv var : GMPI_BOARD , result -1
BH: gmpi_getenv var : GMPI_NUMA_NODE , result (null)
BH: gmpi_getenv var : GMPI_ID , result 7
BH: gmpi_getenv var : GMPI_NP , result 8
BH: gmpi_getenv var : GMPI_BOARD , result -1
BH: gmpi_getenv var : GMPI_NUMA_NODE , result (null)
BH: gmpi_getenv var : GMPI_EAGER , result (null)
BH: gmpi_getenv var : GMPI_SHMEM , result 1
BH: gmpi_getenv var : GMPI_RECV , result (null)
BH: gmpi_conf.c : gethostbyname returned node004
BH: gmpi_getenv var : GMPI_MAGIC , result 7374385
BH: gmpi_getenv var : GMPI_MASTER , result node040
BH: gmpi_getenv var : GMPI_PORT , result 8000
BH: gmpi_getenv var : GMPI_SLAVE , result 172.20.3.4
BH: gmpi_getenv var : GMPI_ID , result 6
BH: gmpi_getenv var : GMPI_NP , result 8
BH: gmpi_getenv var : GMPI_BOARD , result -1
BH: gmpi_getenv var : GMPI_NUMA_NODE , result (null)
BH: gmpi_getenv var : GMPI_EAGER , result (null)
BH: gmpi_getenv var : GMPI_SHMEM , result 1
BH: gmpi_getenv var : GMPI_RECV , result (null)
BH: gmpi_conf.c : gethostbyname returned node001
etc.

but under mpiexec, it fails on GMPI_SLAVE,
BH: gmpi_conf.c : gethostbyname returned node040
BH: gmpi_getenv var : GMPI_MAGIC , result 210
BH: gmpi_getenv var : GMPI_MASTER , result node040
BH: gmpi_getenv var : GMPI_PORT , result 36678
BH: gmpi_getenv var : GMPI_SLAVE , result (null)
<MPICH-GM> Error: Need to obtain the slave's hostname in GMPI_SLAVE !
[0] Error: write to socket failed !
BH: gmpi_conf.c : gethostbyname returned node038
BH: gmpi_getenv var : GMPI_MAGIC , result 210
BH: gmpi_getenv var : GMPI_MASTER , result node040
BH: gmpi_getenv var : GMPI_PORT , result 36678
BH: gmpi_getenv var : GMPI_SLAVE , result (null)
<MPICH-GM> Error: Need to obtain the slave's hostname in GMPI_SLAVE !
[0] Error: write to socket failed !
BH: gmpi_conf.c : gethostbyname returned node040
BH: gmpi_conf.c : gethostbyname returned node039
BH: gmpi_getenv var : GMPI_MAGIC , result 210
BH: gmpi_getenv var : GMPI_MASTER , result node040
BH: gmpi_getenv var : GMPI_PORT , result 36678
BH: gmpi_getenv var : GMPI_SLAVE , result (null)
<MPICH-GM> Error: Need to obtain the slave's hostname in GMPI_SLAVE !
[0] Error: write to socket failed !
.
.
.

I also note that under mpirun GMPI_PORT=8000, whereas as seen above,
under mpiexec its' getting GMPI_PORT , result 36678.

Any ideas what's happening here.

Thanx

Bryan
---------------------------------------

Bryan Hellyer
HPC Systems Programmer
ITS Systems & Infrastructure
University of Melbourne




More information about the mpiexec mailing list