ch_p4 / shmem on SMP linux cluster
Frank Eisenmenger
eisenmenger at fmp-berlin.de
Fri Feb 11 10:11:24 EST 2005
Hi Pete,
thanks for your reply.
> The question to ponder is, should you enable the p4 shared memory
> implementation to try to speed up communications inside a single
> multiprocessor node? We use it here; I haven't discovered a consensus
> on the matter.
In order to understand, how to 'enable the p4 shared memory' I'll better
describe in detail, what I've done.
I'd like mpich & my application (pmemd from Amber8 package) to be clean
32-bit (just like AMD's Opterons, our Intel Noconas allow for 64- &
32-bit-applications, both) and use Intel's 32-bit Fortran compiler as
backend for 'mpif77' & 'mpif90' (this is best for compiling 'pmemd'):
Mpich-1.26
-----------
export CC="gcc -m32"
export CFLAGS="-m32"
export CCFLAGS="-m32"
export CLINKER="gcc -m32"
export CCLINKER="g++ -m32"
export FC=ifort
export F90=ifort
source /usr/local/intel/bin/ifortvars.sh
./configure --with-comm=ch_p4 --with-device=ch_p4
-prefix=/usr/local/mpich-32 --without-mpe -opt="-O3" -optf77="-static -tpp7"
(excluded 'mpe', since it would automatically include 64-bit X11-libs,
'-tpp7' is to optimize for Xeons)
Question: should I add '-with-device=ch_shmem' here ?
But, I cannot have both, '--with-comm=ch_p4' & '--with-comm=shared' -
only the one of these options named first in the configure-command would
be used, I suppose. I'm not very familiar with 'communication' and
'devices' and got confused here.
make & make install (as 'su')
+ put new 'machines.LINUX' to /usr/local/mpich-32/share/ with all nodes
(important for 'mpirun' only):
node1:2
..
node24:2
Mpiexec-0.77
------------
./configure --disable-p4-shmem --with-default-comm=mpich-p4
--with-pbs=/usr/local/encap/torque-1.1.0p0 --with-smp-size=2
NB: seems to work as well without '--with-smp-size=2' !
make
tests:
/usr/local/mpich-32/bin/mpicc -m32 -o hello hello.c
./runtests.pl
--> works out alright
source /usr/local/intel/bin/ifortvars.sh
/usr/local/mpich-32/bin/mpif77 -static-libcxa -o hello hellof.f
(NB: '-static-libcxa' helps, since 'our' nodes do not know the location
of some of Intel's shared libs)
-> errors because of too many command line parameters
make install (as 'su')
With respect to the above, how should I 'enable the p4 shared memory' ?
> Speaking from a user support angle, I would say no. There is enough
> debate on whether the mpich1/p4/shmem implementation is any faster than
> just using mpich/p4/TCP on a node.
There's another aspect and an interesting observation, when running the
'JAC' benchmark with pmemd (it's source modified according to
http://structbio.vanderbilt.edu/archives/amber-archive/2004/1080.phtmlon):
Jobs submitted via: qsub .. -l nodes=<nodes>:ppn=2 <command>
where: <nodes> = No. processors <np> / 2,
<command> is ..
.. either:
<path>/mpirun -np <np> -nolocal \
<my-path>/pmemd <pmemd-input>
.. or:
<path>/mpiexec -kill -nostdin -nostdout \
<my-path>/pmemd <pmemd-input>
mpirun (as I could sees from a files PI..., created during the job is
running) does not seem to 'care' about the $PBS_NODEFILE provided by
PBS: i.e. in our environment, PBS, no matter if '-l nodes=..:ppn=2 or
..:ppn=1 (!), "offers" a list of nodes with 2 processors per node, but
mpirun "chooses", if possible, ONE processor per NODE with the right
total number of processors,
e.g. if submitted with '-l nodes=8:ppn=2': uses 16 nodes, with 1
processor per node.
mpiexec "follows the suggestion" of PBS (TORQUE) and uses 2 processors
per node,
e.g. if submitted with '-l nodes=8:ppn=2': uses 8 nodes with 2 proc. on
each, strictly according to the contents of $PBS_NODEFILE.
The 'JAC' benchmark simulates 1000 ps of molecular dynamics
(http://amber.scripps.edu/amber8.bench1.html) for model system.
The following figure provides sec. of 'clean' computing time (i.e.
without 'setups time') necessary for this benchmark:
No. processors mpirun mpiexec
2 282 298
4 147 156
8 77 82
16 41 46
32 28 27
From this, one might conclude (?), that, since 2 processors on each
node "compete" for shared memory, as in the case with mpiexec, comp.
time's going up.
I am not very familiar with how PBS/Torque was set up by the vendor of
our cluster and how that all works together with mpich/mpiexec. Any
suggestion would be appreciated !
Frank Eisenmenger.
--
Dr. Frank Eisenmenger
Forschungsinstitut für Molekulare Pharmakologie
Abt. NMR-unterstützte Strukturforschung
Tel. +49/0-30-94793-278
FAX +49/0-30-94793-169
Web www.fmp-berlin.de/NMR
E-Mail eisenmenger at fmp-berlin.de
More information about the mpiexec
mailing list