Creating a new comm
Joshua Bernstein
jbernstein at penguincomputing.com
Fri Oct 19 14:50:27 EDT 2007
Hi Peter,
After I get back from SuperComputing I'm going to give this a real
shot. Seems I don't actually have to do too much to get it to work
cleanly with BProc.
If you will be at show (or anybody else on the list for that matter),
please come by the Penguin Computing booth, to say hello.
-Joshua Bernstein
Software Engineer
Penguin Computing
Pete Wyckoff wrote:
> jbernstein at penguincomputing.com wrote on Thu, 11 Oct 2007 12:12 -0700:
>> Pete Wyckoff wrote:
>>> I'm not familiar with how mpich/bproc works. You should take a
>>> look at the mpirun that comes with it, and at the MPID_Init function
>>> in mpid_bproc (or whatever). If you have web pointers to these
>>> things, others can double check that you're headed in the right
>>> direction.
>> This is a helpful direction. Though how do I know what startup method my
>> MPICH distribution is using? I know when MPICH is built its using
>> --comm=bproc. Is this the startup method?
>
> Read the source. Or compile with debugging and step down from
> MPI_Init until you figure out where it ends up. My local mpich1
> source doesn't have anything in it that looks like bproc. You have
> something special, apparently.
>
>> Otherwise, if I'm starting up just over Ethernet on Linux, are I just
>> using ch_p4?
>
> For mpich/p4, yup. Not sure if bproc relies on that or rolls its
> own. There are other ways to startup on ethernet.
>
>> When I try starting up a an mpi job with mpiexec using --comm=p4, It
>> seems to start the processes, but they just sit there. Likely waiting
>> for a signal to tell them to start.
>>
>> How can I figure out what MPICH is using for the startup method?
>>
>> Another hint is that --comm=bproc changes RSHCOMMAND and RCP commands to
>> Scyld specifics (bpsh and bpcp) is mpiexec using these commands at all?
>>
>> In the end the problem I'm having is that when using mpiexec, I'm
>> starting more processes then I need. For example consider:
>>
>> qsub -l nodes=2:ppn=2
>> mpiexec ./myjob
>> ^D
>>
>> mpiexec actually starts up 4, 4 process tasks, rather then just 1, 4
>> process task. Whats interesting is that if I do:
>>
>> mpiexec -npernode 1 ./cpi
>> or
>> mpiexec -pernode ./cpi
>>
>> I only get 2, 4 process jobs.
>
> Sounds like, under the hood, each of these tasks that mpiexec starts
> thinks it should go start up N copies of itself. Hopefully you can
> find some sort of magic environment variable that tells it that it
> doesn't need to spawn any more.
>
> -- Pete
More information about the mpiexec
mailing list