Query about concurrency

Eoin McHugh eoin.mchugh at ichec.ie
Tue Mar 21 10:24:08 EST 2006


On Tue, Mar 21, 2006 at 02:48:44PM +0000, David Golden wrote:
> Only the mpiexec processes themselves need to be on the same node
> due to this, the mpi processes they spawn/manage don't need to be
> on the same node at all AFAIK.   I find it all less confusing
> if I use the first mpiexec only in dedicated server mode, the other
> mpiexecs are clients of that server, the server then managing the mpi
> processes on the nodes (via the PBS TM API) on behalf of the 
> other mpiexecs.

Initially I had thought that this would be the behaviour but any test I
attempt does not yield a positive result. The first mpiexec will run
fine on however many processors but subsequent mpiexec's will yield the
following error:

  mpiexec: Error: tasks_shmem_reduce: When using mpich/p4, the first task
  must be on the same machine as mpiexec itself.  You ended up trying to
  run task 0 on nodeX, not nodeY.

I can run the following fine:

  mpiexec -pernode -n 4 test-job &
  sleep 5
  mpiexec -pernode -n 4 test-job &
  wait

But the following results in the error listed above:

  mpiexec -n 4 test-job &
  sleep 5
  mpiexec -n 4 test-job &
  wait

This appears to be because the first task in both jobs is on node 0 in
the former but not in the latter. I thought initially that there was an
issue with my build of mpiexec but I havn't noticed one. I am getting 
similar errors when I attempt to run the contests perl script 
distributed with mpiexec so perhaps I am missing something.

I was trying to avoid having to use mpiexec in dedicated server mode.

Regards,

-- 
Eoin McHugh
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://email.osc.edu/pipermail/mpiexec/attachments/20060321/6bb81c4b/attachment.bin


More information about the mpiexec mailing list