Creating a new comm
Pete Wyckoff
pw at osc.edu
Thu Oct 11 16:30:53 EDT 2007
jbernstein at penguincomputing.com wrote on Thu, 11 Oct 2007 12:12 -0700:
> Pete Wyckoff wrote:
> >I'm not familiar with how mpich/bproc works. You should take a
> >look at the mpirun that comes with it, and at the MPID_Init function
> >in mpid_bproc (or whatever). If you have web pointers to these
> >things, others can double check that you're headed in the right
> >direction.
>
> This is a helpful direction. Though how do I know what startup method my
> MPICH distribution is using? I know when MPICH is built its using
> --comm=bproc. Is this the startup method?
Read the source. Or compile with debugging and step down from
MPI_Init until you figure out where it ends up. My local mpich1
source doesn't have anything in it that looks like bproc. You have
something special, apparently.
> Otherwise, if I'm starting up just over Ethernet on Linux, are I just
> using ch_p4?
For mpich/p4, yup. Not sure if bproc relies on that or rolls its
own. There are other ways to startup on ethernet.
> When I try starting up a an mpi job with mpiexec using --comm=p4, It
> seems to start the processes, but they just sit there. Likely waiting
> for a signal to tell them to start.
>
> How can I figure out what MPICH is using for the startup method?
>
> Another hint is that --comm=bproc changes RSHCOMMAND and RCP commands to
> Scyld specifics (bpsh and bpcp) is mpiexec using these commands at all?
>
> In the end the problem I'm having is that when using mpiexec, I'm
> starting more processes then I need. For example consider:
>
> qsub -l nodes=2:ppn=2
> mpiexec ./myjob
> ^D
>
> mpiexec actually starts up 4, 4 process tasks, rather then just 1, 4
> process task. Whats interesting is that if I do:
>
> mpiexec -npernode 1 ./cpi
> or
> mpiexec -pernode ./cpi
>
> I only get 2, 4 process jobs.
Sounds like, under the hood, each of these tasks that mpiexec starts
thinks it should go start up N copies of itself. Hopefully you can
find some sort of magic environment variable that tells it that it
doesn't need to spawn any more.
-- Pete
More information about the mpiexec
mailing list