Query about concurrency
Eoin McHugh
eoin.mchugh at ichec.ie
Tue Mar 21 11:24:11 EST 2006
On Tue, Mar 21, 2006 at 10:59:52AM -0500, Pete Wyckoff wrote:
> David is right on with his explanation, but the limitation you're
> seeing is only because you're using mpich-p4. Communication between
> task#0 and mpiexec as implemented in mpich-p4 restricts both
> processes to be on the same node. So your first case works fine
> because task#0 of each of the two parallel jobs end up on node#0
> because "-pernode" only uses one of the two CPUs on node#0, and the
> second CPU of node#0 is assigned to task#0 of the second job. In
> your second case, task#0 of the second test job ends up on a
> different node because both CPUs of node#0 are taken by the first
> job. You can run with "-v" to see the allocations chosen.
Pete,
Thanks for the reply. That is what I thought was happening, I just
didn't realise that this was a hard limitation of mpich-p4. I suppose
this confusion could really be avoided if the mpiexec man page included
this limitation in the concurrency section.
> Other MPI implementations do not have this restriction. It
> shouldn't be too difficult to fix mpich to support your usage. If
> you want to do this, I can point you to the place in the mpich code
> to help you get started.
I don't think I'll be making those changes at this point in time as the
potential risks out weigh the gain for a few users.
Regards,
--
Eoin McHugh
ICHEC Systems
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://email.osc.edu/pipermail/mpiexec/attachments/20060321/4ea5f204/attachment.bin
More information about the mpiexec
mailing list