MPIEXEC and Intel MPI library 1.0.1

Anton Starikov A.Starikov at UTWENTE.NL
Mon May 2 15:08:02 EDT 2005


It seems that Intel MPI rely on environment variable HOSTNAME.

Pete, when processes started via MPIEXEC variable HOSTNAME is exported 
from root-node and has the same value on all nodes. I think this is 
wrong behavior in general.

I'll fix it.

Anton.

Thomas Zeiser wrote:
> Dear All!
> 
> I just tested the patch to get Intel MPI running.  
> 
> With version 1.0 of Intel MPI everything is fine. 
> 
> However, when I try the recent upate (Intel MPI 1.0.1) I get very
> strange results:
> 
> - I start the MPI program with
>   mpiexec -comm pmi [-verbose] ./test-f-g77-intelmpi101
> 
> - the processes are correctly started on all nodes (twice on
>   snode164 and snode164; veryfied with "ps")
> 
>   mpiexec: resolve_exe: using absolute exe "./test-f-g77-intelmpi101".
>   mpiexec: accept_pmi_conn: got request: cmd=initack pmiid=0.
>   mpiexec: accept_pmi_conn: rank 0 checks in.
>   mpiexec: accept_pmi_conn: got request: cmd=init pmi_version=1.1.
>   mpiexec: accept_pmi_conn: got request: cmd=initack pmiid=1.
>   mpiexec: accept_pmi_conn: rank 1 checks in.
>   mpiexec: accept_pmi_conn: got request: cmd=init pmi_version=1.1.
>   mpiexec: accept_pmi_conn: got request: cmd=initack pmiid=2.
>   mpiexec: accept_pmi_conn: rank 2 checks in.
>   mpiexec: accept_pmi_conn: got request: cmd=init pmi_version=1.1.
>   mpiexec: accept_pmi_conn: got request: cmd=initack pmiid=3.
>   mpiexec: accept_pmi_conn: rank 3 checks in.
>   mpiexec: accept_pmi_conn: got request: cmd=init pmi_version=1.1.
>   mpiexec: All 4 tasks started.
> 
> - in the test program all MPI processes get their hostname using
>   MPI_GET_PROCESSOR_NAME and send it withMPI_SEND to the master.
>   The master receives the messages with MPI_RECV and outputs them
>   (it's the simple test.f* program form the Intel MPI test
>   directory). The outut is the following:
> 
>   Hello world: rank  0 of  4 running on snode164
>   Hello world: rank  1 of  4 running on snode164
>   Hello world: rank  2 of  4 running on snode164
>   Hello world: rank  3 of  4 running on snode164
> 
>   All processes seem to be running on the same node!
> 
> - now MPI_FINALIZE comes in the program. However, the processes
>   hang. When I now kill step by step all processes I get
> 
>   accept_pmi_conn: waiting for info
>   accept_pmi_conn: waiting for info
>   accept_pmi_conn: waiting for info
>   accept_pmi_conn: waiting for info
>   wait_one_task_start: evt = 2, task 0 host snode164
>   wait_one_task_start: evt = 3, task 1 host snode164
>   wait_one_task_start: evt = 4, task 2 host snode163
>   wait_one_task_start: evt = 5, task 3 host snode163
>   wait_tasks: waiting for snode164 snode164 snode163 snode163
>   wait_tasks: waiting for snode164 snode163 snode163
>   wait_tasks: waiting for snode164 snode163
>   wait_tasks: waiting for snode163
> 
>   Killed
>   ABORT - process 3: failure: Other MPI error
>   mpiexec: wait_tasks: numspawned = 4, got evt 6 for tid 2 host snode164 status 1.
>   mpiexec: wait_tasks: numspawned = 3, got evt 9 for tid 5 host snode163 status 13.
>   Killed
>   mpiexec: wait_tasks: numspawned = 2, got evt 7 for tid 3 host snode164 status 1.
>   Killed
>   mpiexec: wait_tasks: numspawned = 1, got evt 8 for tid 4 host snode163 status 1.
>   mpiexec: Warning: tasks 0-2 exited with status 1.
>   mpiexec: Warning: task 3 exited with status 13.
> 
> 
> 
> 
> Any ideas (exept using totalview to get a better insight)?
> 
> 
> Kind regards,
> 
> Thomas Zeiser



More information about the mpiexec mailing list