about sh-mem mechanism

Jose Luis Gordillo Ruiz jlgr at super.unam.mx
Wed Oct 30 09:00:27 EST 2002


> >   by using shmem, but this imply redesign the load balancing of the
> >   parallel algorithm. Because clusters of SMP are not (yet) the standard,
> >   most parallel programas not take shmem into account. also there is
> >   another approach to this, using an hibrid paradimg (OpenMP inside the SMP
> >   and MPI-sockets outside) but, again, there is no proof that this is
> >   better that pure MPI or MPI-shmem with MPI-sockets.
>
> First off, you're assuming MPICH/ch_p4, but that's not the only option
> out there.  MPICH/ch_gm for Myrinet-connected clusters also supports
> a hybrid shmem/network configuration, though not using TCP/IP or sockets
> for off-node communication.
>
> Second, I think you'll find that most large clusters are clusters of
> SMPs.  The only really big (>>100 nodes) cluster of uniproc nodes I
> can think of is CPlant.
>

  i aggre with you. by the whay, the bandwidth of myrinet is comparable
  to shmem? if it is, then using shmem is mandatory.

> I've done some tests using MPI vs. hybrid MPI/OpenMP, and everything
> I've seen says that the hybrid MPI/OpenMP approach only pays off at
> very large processor counts doing something comm-intensive like a
> parallel FFT (eg. NAS Parallel Benchmark FT).  For nearest-neighbor
> communication patterns, doing shmem transfers can improve performance
> because you only pay the big latency hit for going off-node for about
> half your total communications.  Shmem helps a *lot* to lower your
> average communications latency, in my experience.  (However, I'm also
> dealing with MPICH/ch_gm, where going off-node is 2-3x increase in
> latency; MPICH/ch_p4 is more like a 10x latency increase.)
>
   i have to review some results i saw about latencies. in those,
   latencies for p4 and shmem were almost the same, and the gain was just
   the higher bandwidth of shmem. anyway, my point is that in order to
   get performance increments, you must have a communication pattern which
   is aware of your hibrid communication mechanism, this implies you
   design a parallel algorithm taking into account an specific topology.
   in the case of nearest-neighbor, you will get a performance improve
   only if the number of neighbors equals the number of procs. into the
   SMP node (and nearest-neighbor is just a particular pattern of comm).
   well, these are just words. i'll try to get some results and then
   dispute about them.

   saludos,

  jlgr


> 	--Troy
> --
> Troy Baer                       email:  troy at osc.edu
> Science & Technology Support    phone:  614-292-9701
> Ohio Supercomputer Center       web:  http://oscinfo.osc.edu
>
>




More information about the mpiexec mailing list