about sh-mem mechanism
Jose Luis Gordillo Ruiz
jlgr at super.unam.mx
Wed Oct 30 09:00:27 EST 2002
> > by using shmem, but this imply redesign the load balancing of the
> > parallel algorithm. Because clusters of SMP are not (yet) the standard,
> > most parallel programas not take shmem into account. also there is
> > another approach to this, using an hibrid paradimg (OpenMP inside the SMP
> > and MPI-sockets outside) but, again, there is no proof that this is
> > better that pure MPI or MPI-shmem with MPI-sockets.
>
> First off, you're assuming MPICH/ch_p4, but that's not the only option
> out there. MPICH/ch_gm for Myrinet-connected clusters also supports
> a hybrid shmem/network configuration, though not using TCP/IP or sockets
> for off-node communication.
>
> Second, I think you'll find that most large clusters are clusters of
> SMPs. The only really big (>>100 nodes) cluster of uniproc nodes I
> can think of is CPlant.
>
i aggre with you. by the whay, the bandwidth of myrinet is comparable
to shmem? if it is, then using shmem is mandatory.
> I've done some tests using MPI vs. hybrid MPI/OpenMP, and everything
> I've seen says that the hybrid MPI/OpenMP approach only pays off at
> very large processor counts doing something comm-intensive like a
> parallel FFT (eg. NAS Parallel Benchmark FT). For nearest-neighbor
> communication patterns, doing shmem transfers can improve performance
> because you only pay the big latency hit for going off-node for about
> half your total communications. Shmem helps a *lot* to lower your
> average communications latency, in my experience. (However, I'm also
> dealing with MPICH/ch_gm, where going off-node is 2-3x increase in
> latency; MPICH/ch_p4 is more like a 10x latency increase.)
>
i have to review some results i saw about latencies. in those,
latencies for p4 and shmem were almost the same, and the gain was just
the higher bandwidth of shmem. anyway, my point is that in order to
get performance increments, you must have a communication pattern which
is aware of your hibrid communication mechanism, this implies you
design a parallel algorithm taking into account an specific topology.
in the case of nearest-neighbor, you will get a performance improve
only if the number of neighbors equals the number of procs. into the
SMP node (and nearest-neighbor is just a particular pattern of comm).
well, these are just words. i'll try to get some results and then
dispute about them.
saludos,
jlgr
> --Troy
> --
> Troy Baer email: troy at osc.edu
> Science & Technology Support phone: 614-292-9701
> Ohio Supercomputer Center web: http://oscinfo.osc.edu
>
>
More information about the mpiexec
mailing list