mpiexec mvapich rank error

Alex Ninaber Alex.Ninaber at clustervision.com
Wed Jan 25 12:21:27 EST 2006


Dear all,


I have the following error from IB & mpiexec:

Error: read_ib_startup_ports: barrier expecting rank 0, got 10.


- mpiexec svn, checkout 1/25/2006
- mvapich gen2
- torque 2

Please see below the output from Torque, any ideas why it gets the wrong 
rank (rank 10 with a 4 processor job)?

Regards,

Alex



mpiexec: resolve_exe: using absolute exe 
"/home/cvsupport/PBS_training/PMB-MPI1.ib".
mpiexec: concurrent_init: i am concurrent master.
mpiexec: stdio_msg_parent_read: got hello from listener.
mpiexec: start_tasks: command to 0/4 node161.ic.cluster: if test -d 
"/home/cvsupport"; then cd "/home/cvsupport"; fi; exec /bin/bash -c 
'exec /home/cvsupport/
PBS_training/PMB-MPI1.ib -multi 1'.
mpiexec: service_ib_startup: new task, now accept wait 1.
mpiexec: start_tasks: command to 1/4 node161.ic.cluster: if test -d 
"/home/cvsupport"; then cd "/home/cvsupport"; fi; exec /bin/bash -c 
'exec /home/cvsupport/
PBS_training/PMB-MPI1.ib -multi 1'.
mpiexec: service_ib_startup: new task, now accept wait 2.
mpiexec: start_tasks: command to 2/4 node160.ic.cluster: if test -d 
"/home/cvsupport"; then cd "/home/cvsupport"; fi; exec /bin/bash -c 
'exec /home/cvsupport/
PBS_training/PMB-MPI1.ib -multi 1'.
mpiexec: service_ib_startup: new task, now accept wait 3.
mpiexec: start_tasks: command to 3/4 node160.ic.cluster: if test -d 
"/home/cvsupport"; then cd "/home/cvsupport"; fi; exec /bin/bash -c 
'exec /home/cvsupport/
PBS_training/PMB-MPI1.ib -multi 1'.
mpiexec: service_ib_startup: new task, now accept wait 4.
mpiexec: service_ib_startup: no new task, now accept wait 4.
mpiexec: service_ib_startup: no new task, now accept wait 4.
mpiexec: process_start_event: evt 2 task 0 on node161.ic.cluster.
mpiexec: process_start_event: evt 3 task 1 on node161.ic.cluster.
mpiexec: process_start_event: evt 4 task 2 on node160.ic.cluster.
mpiexec: process_start_event: evt 5 task 3 on node160.ic.cluster.
mpiexec: service_ib_startup: accepted fd 5, accept wait 3.
mpiexec: service_ib_startup: reading fd 5, read wait 0.
mpiexec: read_ib_one: version 2 startup.
mpiexec: service_ib_startup: rank 1 in, 3 + 0 left.
mpiexec: service_ib_startup: no new task, now accept wait 3.
mpiexec: service_ib_startup: accepted fd 6, accept wait 2.
mpiexec: service_ib_startup: reading fd 6, read wait 0.
mpiexec: service_ib_startup: rank 0 in, 2 + 0 left.
mpiexec: service_ib_startup: no new task, now accept wait 2.
mpiexec: service_ib_startup: accepted fd 7, accept wait 1.
mpiexec: service_ib_startup: reading fd 7, read wait 0.
mpiexec: service_ib_startup: rank 2 in, 1 + 0 left.
mpiexec: service_ib_startup: no new task, now accept wait 1.
mpiexec: service_ib_startup: accepted fd 9, accept wait 0.
mpiexec: service_ib_startup: reading fd 9, read wait 0.
mpiexec: service_ib_startup: rank 3 in, 0 + 0 left.
mpiexec: service_ib_startup: no new task, now accept wait 0.
mpiexec: All 4 tasks (spawn 0) started.
mpiexec: read_ib_startup_ports: waiting for checkin: 0 to accept, 0 to read.
mpiexec: read_ib_startup_ports: barrier startmpiexec: listen_abort_fd: 
parent says via index 0 to listen to abort fd 4.
.
mpiexec: Error: read_ib_startup_ports: barrier expecting rank 0, got 10.
mpiexec: stdio_msg_listener_read: pipe closed, exiting too.
read: Connection reset by peer
read: Connection reset by peer
read: Connection reset by peer
read: Connection reset by peer







More information about the mpiexec mailing list