Mpiexec runs 1 process only

Pete Wyckoff pw at osc.edu
Mon Aug 30 09:40:30 EDT 2004


dsternkopf at hpce.nec.com wrote on Mon, 30 Aug 2004 13:15 +0200:
> > You're pretty sure you complied mpich using --with-device=ch_shmem and
> > not, say, ch_p4 or something else?  Other devices expect that mpiexec
> > will spawn all the tasks, one at a time.
> 
> MPICH compiled with ch_p4. 

That's a pretty important fact.  You must make sure that mpiexec and
mpich agree on the underlying device.

> If I specify the ch_p4 devices for mpicexec I get the following error:
> 
> asama.ess.nec.de:~/mpiexec-0.76 2915> /home/danny/mpiexec-0.76/mpiexec -v -v -v -nostdout -nostdin -comm mpich-p4 /home/danny/mpiexec-0.76/hello
[..]
> p0_12062: (-0.000131) send_message: to=1; invalid conn type=5
> p0_12062:  p4_error: subtree_broadcast_p4 failed, type=: 1010101010
> wait_one_task_start: evt = 2, task 0 host asama.ess.nec.de
> All 1 task started.
> wait_tasks: waiting for asama.ess.nec.de/0
> wait_tasks: numspawned = 1, got evt 3 for tid 4 host asama.ess.nec.de status 1
> mpiexec: Warning: task 0 exited with status 1.

Try "mpiexec --comm=p4 -mpich-p4-no-shmem ..." and read the section in
the README that explains why you may need to do that.  (Likely wrong
options to mpich build.)

> How does mpiexec know where a certain MPICH version is installed?
> 
> I have build mpicexec as follows:
> ./configure --with-pbs=/usr/local --disable-mpich-gm --disable-lam --disable-emp

Doesn't care.  Just wants the type.  Those disables don't get you much, 
but they don't hurt either.  What you do want is
--with-default-comm=mpich-p4, then maybe something about p4-shmem once
you read the README and decide whether to rebuild your mpich or not.

> > If you send me the output of "qstat -f $PBS_JOBID" within the batch job,
> > and run "mpiexec -v -v -v ..." to show lots of its debugging messages
> > too, maybe we'll be able to see a problem.
[..]
>     exec_host = asama.ess.nec.de/0*2
>     Resource_List.ncpus = 2

Yup, you've got a big SMP.  Mpiexec knows you want two processes but
assumes you compiled mpich-p4 with -shmem since you didn't tell it
otherwise.  You might consider using the mpich ch_shmem device, but it's
not always better.

		-- Pete



More information about the mpiexec mailing list