ch_p4 and enable-p4-shmem

Bisbal, Prentice PBisbal at LexPharma.com
Wed Feb 22 14:38:16 EST 2006


I know this topic has been discussed before, but I haven't been able to find the answer to my problem in the archives. 

I have several multi-processor linux systems. From what I can tell from reading the list archives, and documentation, the best configuration options for this arrangement are

mpich: --with-device=ch_p4 --with-comm=shared
mpiexec: --with-default-comm=mpich-p4 --enable-p4-shmem 

I'm using mpich 1.2.7p1, which I configured thusly:

./configure --prefix=/usr/local/mpich-1.2.7p1 --enable-sharedlib --with-device=ch_p4 --with-comm=shared

I compiled mpiexec 0.80 with these options:

./configure --with-prefix=/usr/local/mpiexec-0.80 --with-pbs=/usr/local --with-default-comm=mpich-p4 --enable-p4-shmem 

When I run runtests.pl with 

$available_nodes = 4;
$smpsize = 1;

The test script doesn't encounter any errors. 

When I change $smpsize=2, I get errors like this:

 ./runtests.pl 
Testing 4 nodes with SMP size 2.
2533 to testqo.5261.01 mpiexec -n 1 hello ...
2534 to testqo.5261.02 mpiexec -n 2 hello ..
2535 to testqo.5261.03 mpiexec -n 3 hello ...
2536 to testqo.5261.04 mpiexec -n 8 hello ...........................
File testho.5261.04: unexpected line: hello: hw-underdog.lexpharma.com  MPI_Init did not finish
File testho.5261.04: unexpected line: p0_5499:  p4_error: interrupt SIGSEGV: 11
File testho.5261.04: unexpected line: hello: hw-optimus.lexpharma.com  MPI_Init did not finish
File testho.5261.04: unexpected line: rm_29795:  p4_error: interrupt SIGSEGV: 11
File testho.5261.04: unexpected line: hello: hw-appsrv05.lexpharma.com  MPI_Init did not finish
File testho.5261.04: unexpected line: rm_17122:  p4_error: interrupt SIGSEGV: 11
File testho.5261.04: unexpected line: /bin/bash: /scratch.d/hw-underdog/pbisbal/mpiexec-0.80/hello: cannot execute binary file
File testho.5261.04: unexpected line: /bin/bash: /scratch.d/hw-underdog/pbisbal/mpiexec-0.80/hello: Exec format error
File testho.5261.04: unexpected line: p0_5499: (24.068400) net_send: could not write to fd=4, errno = 32
File testho.5261.04: unexpected line: mpiexec: Warning: tasks 0,3 exited with status 1.
File testho.5261.04: unexpected line: mpiexec: Warning: tasks 1-2 exited with status 139.
2537 to testqo.5261.05 mpiexec -pernode hello .....................
File testho.5261.05: unexpected line: /bin/bash: /scratch.d/hw-underdog/pbisbal/mpiexec-0.80/hello: cannot execute binary file
File testho.5261.05: unexpected line: /bin/bash: /scratch.d/hw-underdog/pbisbal/mpiexec-0.80/hello: Exec format error
File testho.5261.05: unexpected line: hello: hw-underdog.lexpharma.com  MPI_Init did not finish
File testho.5261.05: unexpected line: p0_5563:  p4_error: interrupt SIGSEGV: 11
File testho.5261.05: unexpected line: hello: hw-optimus.lexpharma.com  MPI_Init did not finish
File testho.5261.05: unexpected line: rm_29800:  p4_error: interrupt SIGSEGV: 11
File testho.5261.05: unexpected line: hello: hw-appsrv05.lexpharma.com  MPI_Init did not finish
File testho.5261.05: unexpected line: rm_17131:  p4_error: interrupt SIGSEGV: 11
File testho.5261.05: unexpected line: p0_5563: (18.032479) net_send: could not write to fd=4, errno = 32
File testho.5261.05: unexpected line: mpiexec: Warning: tasks 0,3 exited with status 1.
File testho.5261.05: unexpected line: mpiexec: Warning: tasks 1-2 exited with status 139.
2538 to testqo.5261.06 mpiexec -nolocal hello ..
File testho.5261.06: unexpected line: mpiexec: Error: constrain_nodes: -nolocal will not work with mpich/p4.
File testho.5261.06: got 1 lines, expected 6.
2539 to testqo.5261.07 mpiexec hello ..........................


Any ideas what is wrong? I've been googling for days, but haven't turned up any meaningful answers. Is this a problem with mpiexec or mpich? 


Prentice 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://email.osc.edu/pipermail/mpiexec/attachments/20060222/f61a0bbc/attachment.htm


More information about the mpiexec mailing list