ch_p4 and enable-p4-shmem
Bisbal, Prentice
PBisbal at LexPharma.com
Wed Feb 22 14:38:16 EST 2006
I know this topic has been discussed before, but I haven't been able to find the answer to my problem in the archives.
I have several multi-processor linux systems. From what I can tell from reading the list archives, and documentation, the best configuration options for this arrangement are
mpich: --with-device=ch_p4 --with-comm=shared
mpiexec: --with-default-comm=mpich-p4 --enable-p4-shmem
I'm using mpich 1.2.7p1, which I configured thusly:
./configure --prefix=/usr/local/mpich-1.2.7p1 --enable-sharedlib --with-device=ch_p4 --with-comm=shared
I compiled mpiexec 0.80 with these options:
./configure --with-prefix=/usr/local/mpiexec-0.80 --with-pbs=/usr/local --with-default-comm=mpich-p4 --enable-p4-shmem
When I run runtests.pl with
$available_nodes = 4;
$smpsize = 1;
The test script doesn't encounter any errors.
When I change $smpsize=2, I get errors like this:
./runtests.pl
Testing 4 nodes with SMP size 2.
2533 to testqo.5261.01 mpiexec -n 1 hello ...
2534 to testqo.5261.02 mpiexec -n 2 hello ..
2535 to testqo.5261.03 mpiexec -n 3 hello ...
2536 to testqo.5261.04 mpiexec -n 8 hello ...........................
File testho.5261.04: unexpected line: hello: hw-underdog.lexpharma.com MPI_Init did not finish
File testho.5261.04: unexpected line: p0_5499: p4_error: interrupt SIGSEGV: 11
File testho.5261.04: unexpected line: hello: hw-optimus.lexpharma.com MPI_Init did not finish
File testho.5261.04: unexpected line: rm_29795: p4_error: interrupt SIGSEGV: 11
File testho.5261.04: unexpected line: hello: hw-appsrv05.lexpharma.com MPI_Init did not finish
File testho.5261.04: unexpected line: rm_17122: p4_error: interrupt SIGSEGV: 11
File testho.5261.04: unexpected line: /bin/bash: /scratch.d/hw-underdog/pbisbal/mpiexec-0.80/hello: cannot execute binary file
File testho.5261.04: unexpected line: /bin/bash: /scratch.d/hw-underdog/pbisbal/mpiexec-0.80/hello: Exec format error
File testho.5261.04: unexpected line: p0_5499: (24.068400) net_send: could not write to fd=4, errno = 32
File testho.5261.04: unexpected line: mpiexec: Warning: tasks 0,3 exited with status 1.
File testho.5261.04: unexpected line: mpiexec: Warning: tasks 1-2 exited with status 139.
2537 to testqo.5261.05 mpiexec -pernode hello .....................
File testho.5261.05: unexpected line: /bin/bash: /scratch.d/hw-underdog/pbisbal/mpiexec-0.80/hello: cannot execute binary file
File testho.5261.05: unexpected line: /bin/bash: /scratch.d/hw-underdog/pbisbal/mpiexec-0.80/hello: Exec format error
File testho.5261.05: unexpected line: hello: hw-underdog.lexpharma.com MPI_Init did not finish
File testho.5261.05: unexpected line: p0_5563: p4_error: interrupt SIGSEGV: 11
File testho.5261.05: unexpected line: hello: hw-optimus.lexpharma.com MPI_Init did not finish
File testho.5261.05: unexpected line: rm_29800: p4_error: interrupt SIGSEGV: 11
File testho.5261.05: unexpected line: hello: hw-appsrv05.lexpharma.com MPI_Init did not finish
File testho.5261.05: unexpected line: rm_17131: p4_error: interrupt SIGSEGV: 11
File testho.5261.05: unexpected line: p0_5563: (18.032479) net_send: could not write to fd=4, errno = 32
File testho.5261.05: unexpected line: mpiexec: Warning: tasks 0,3 exited with status 1.
File testho.5261.05: unexpected line: mpiexec: Warning: tasks 1-2 exited with status 139.
2538 to testqo.5261.06 mpiexec -nolocal hello ..
File testho.5261.06: unexpected line: mpiexec: Error: constrain_nodes: -nolocal will not work with mpich/p4.
File testho.5261.06: got 1 lines, expected 6.
2539 to testqo.5261.07 mpiexec hello ..........................
Any ideas what is wrong? I've been googling for days, but haven't turned up any meaningful answers. Is this a problem with mpiexec or mpich?
Prentice
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://email.osc.edu/pipermail/mpiexec/attachments/20060222/f61a0bbc/attachment.htm
More information about the mpiexec
mailing list