Query about concurrency
Pete Wyckoff
pw at osc.edu
Tue Mar 21 10:59:52 EST 2006
eoin.mchugh at ichec.ie wrote on Tue, 21 Mar 2006 15:24 +0000:
> On Tue, Mar 21, 2006 at 02:48:44PM +0000, David Golden wrote:
> > Only the mpiexec processes themselves need to be on the same node
> > due to this, the mpi processes they spawn/manage don't need to be
> > on the same node at all AFAIK. I find it all less confusing
> > if I use the first mpiexec only in dedicated server mode, the other
> > mpiexecs are clients of that server, the server then managing the mpi
> > processes on the nodes (via the PBS TM API) on behalf of the
> > other mpiexecs.
>
> Initially I had thought that this would be the behaviour but any test I
> attempt does not yield a positive result. The first mpiexec will run
> fine on however many processors but subsequent mpiexec's will yield the
> following error:
>
> mpiexec: Error: tasks_shmem_reduce: When using mpich/p4, the first task
> must be on the same machine as mpiexec itself. You ended up trying to
> run task 0 on nodeX, not nodeY.
>
> I can run the following fine:
>
> mpiexec -pernode -n 4 test-job &
> sleep 5
> mpiexec -pernode -n 4 test-job &
> wait
>
> But the following results in the error listed above:
>
> mpiexec -n 4 test-job &
> sleep 5
> mpiexec -n 4 test-job &
> wait
>
> This appears to be because the first task in both jobs is on node 0 in
> the former but not in the latter. I thought initially that there was an
> issue with my build of mpiexec but I havn't noticed one. I am getting
> similar errors when I attempt to run the contests perl script
> distributed with mpiexec so perhaps I am missing something.
>
> I was trying to avoid having to use mpiexec in dedicated server mode.
David is right on with his explanation, but the limitation you're
seeing is only because you're using mpich-p4. Communication between
task#0 and mpiexec as implemented in mpich-p4 restricts both
processes to be on the same node. So your first case works fine
because task#0 of each of the two parallel jobs end up on node#0
because "-pernode" only uses one of the two CPUs on node#0, and the
second CPU of node#0 is assigned to task#0 of the second job. In
your second case, task#0 of the second test job ends up on a
different node because both CPUs of node#0 are taken by the first
job. You can run with "-v" to see the allocations chosen.
Other MPI implementations do not have this restriction. It
shouldn't be too difficult to fix mpich to support your usage. If
you want to do this, I can point you to the place in the mpich code
to help you get started.
-- Pete
More information about the mpiexec
mailing list