Mpiexec runs 1 process only
Pete Wyckoff
pw at osc.edu
Mon Aug 30 09:40:30 EDT 2004
dsternkopf at hpce.nec.com wrote on Mon, 30 Aug 2004 13:15 +0200:
> > You're pretty sure you complied mpich using --with-device=ch_shmem and
> > not, say, ch_p4 or something else? Other devices expect that mpiexec
> > will spawn all the tasks, one at a time.
>
> MPICH compiled with ch_p4.
That's a pretty important fact. You must make sure that mpiexec and
mpich agree on the underlying device.
> If I specify the ch_p4 devices for mpicexec I get the following error:
>
> asama.ess.nec.de:~/mpiexec-0.76 2915> /home/danny/mpiexec-0.76/mpiexec -v -v -v -nostdout -nostdin -comm mpich-p4 /home/danny/mpiexec-0.76/hello
[..]
> p0_12062: (-0.000131) send_message: to=1; invalid conn type=5
> p0_12062: p4_error: subtree_broadcast_p4 failed, type=: 1010101010
> wait_one_task_start: evt = 2, task 0 host asama.ess.nec.de
> All 1 task started.
> wait_tasks: waiting for asama.ess.nec.de/0
> wait_tasks: numspawned = 1, got evt 3 for tid 4 host asama.ess.nec.de status 1
> mpiexec: Warning: task 0 exited with status 1.
Try "mpiexec --comm=p4 -mpich-p4-no-shmem ..." and read the section in
the README that explains why you may need to do that. (Likely wrong
options to mpich build.)
> How does mpiexec know where a certain MPICH version is installed?
>
> I have build mpicexec as follows:
> ./configure --with-pbs=/usr/local --disable-mpich-gm --disable-lam --disable-emp
Doesn't care. Just wants the type. Those disables don't get you much,
but they don't hurt either. What you do want is
--with-default-comm=mpich-p4, then maybe something about p4-shmem once
you read the README and decide whether to rebuild your mpich or not.
> > If you send me the output of "qstat -f $PBS_JOBID" within the batch job,
> > and run "mpiexec -v -v -v ..." to show lots of its debugging messages
> > too, maybe we'll be able to see a problem.
[..]
> exec_host = asama.ess.nec.de/0*2
> Resource_List.ncpus = 2
Yup, you've got a big SMP. Mpiexec knows you want two processes but
assumes you compiled mpich-p4 with -shmem since you didn't tell it
otherwise. You might consider using the mpich ch_shmem device, but it's
not always better.
-- Pete
More information about the mpiexec
mailing list