mpiexec and myrinet

Todd Merritt tmerritt at email.arizona.edu
Tue Nov 12 19:21:25 EST 2002


I have mpich-gm 1.2.4..8a, and mpiexec 0.70 running with pbspro 5.2.0 on
a 32 node 2 way cluster.  It seems to run fine in most cases but I have
a strange problem that I was hoping someone could shed some light on. 
When I run mm5 on 4 nodes, 2ppn via mpiexec, it runs fine.  When I run
it on 16 nodes, 1ppn, it runs fine.  When I run it on 8 nodes, 2ppn, one
of the processes terminates, inexplicably.  If I run it with mpirun, it
runs fine.  I'm quite baffled, then environment for the 2 runs looks
similar as far as the GMPI variables go.  Running with mpiexec -verbose
yields little help, here is the output though:

read_gm_startup_ports: waiting for info
read_gm_startup_ports: id 2 port 2 board 0 gm_node_id 34 pid 14555
read_gm_startup_ports: id 4 port 2 board 0 gm_node_id 32 pid  6808
read_gm_startup_ports: id 10 port 4 board 0 gm_node_id 34 pid 14560
read_gm_startup_ports: id 12 port 4 board 0 gm_node_id 32 pid  6813
read_gm_startup_ports: id 6 port 2 board 0 gm_node_id 28 pid  5622
read_gm_startup_ports: id 3 port 2 board 0 gm_node_id 31 pid  2477
read_gm_startup_ports: id 9 port 2 board 0 gm_node_id 30 pid  3174
read_gm_startup_ports: id 14 port 4 board 0 gm_node_id 28 pid  5625
read_gm_startup_ports: id 0 port 2 board 0 gm_node_id 29 pid 21571
read_gm_startup_ports: id 11 port 4 board 0 gm_node_id 31 pid  2480
read_gm_startup_ports: id 1 port 4 board 0 gm_node_id 30 pid  3171
read_gm_startup_ports: id 5 port 2 board 0 gm_node_id 33 pid   979
read_gm_startup_ports: id 7 port 2 board 0 gm_node_id 27 pid  2044
read_gm_startup_ports: id 8 port 4 board 0 gm_node_id 29 pid 21572
read_gm_startup_ports: id 13 port 4 board 0 gm_node_id 33 pid   984
read_gm_startup_ports: id 15 port 4 board 0 gm_node_id 27 pid  2047
wait_tasks: numspawned = 16, got evt 30 for tid 14 host node028 status
267
wait_tasks: numspawned = 15, got evt 31 for tid 15 host node027 status
267
wait_tasks: numspawned = 14, got evt 26 for tid 10 host node032 status
267
wait_tasks: 8 stray obit 0 while waiting for kill 26


Any ideas, or suggestions for further debugging ?  I've tried it with
-no-shmem, but it doesn't seem to make a difference.

Thanks,

Todd




More information about the mpiexec mailing list