mpiexec and interactive read

Bryan Putnam bfp at purdue.edu
Thu May 15 12:11:00 EDT 2003


On Thu, 15 May 2003, Pete Wyckoff wrote:

> bfp at purdue.edu said on Thu, 15 May 2003 10:16 -0500:
> > I have a part of the famous "pi calculating" program below where node 0
> > reads some input from the terminal. If I run this program using for
> > example,
> >
> > mpirun -np 4 -machinefile $PBS_NODEFILE prog
> >
> > it runs normally, but when using
> >
> > mpiexec -n 4 prog
> >
> > the program doesn't appear to stop and wait at the read, and it complains
> > about reading the end of file.
>
> Mpiexec tries to connect the stdin of your interactive shell, or batch
> process, to MPI id #0, by default.  As Troy pointed out, if you want it
> to talk to all processes you need to do something special.  But for your
> code I think just #0 will be sufficient.
>
> If you have not applied a patch to PBS and restarted the pbs_mom on the
> node where your code is run, this stdin redirection will not work.  See
> patch/pbs-2.3.12-mpiexec.diff and README for instructions on how to
> apply it.

Yes, perhaps that's the problem. We haven't applied the PBS patch. We're
running the latest version (5?) of PBSPro. I'll look into applying that
patch. Is the patch needed for this to work only on node 0, or is it
needed only if I want to do this on all nodes (which I don't).

Anyway, here's a script with some verbose output, (and thanks for your
help).

Bryan

++++++++++++++++++++++++++
Script started on Thu May 15 11:06:19 2003
radon 1000% qsub -I -l nodes=4,walltime=2:00:00
qsub: waiting for job 70731.krypton.rcs.purdue.edu to start
qsub: job 70731.krypton.rcs.purdue.edu ready

zn-037 1000% od
zn-037 1001% pwd
/home/clerk/u76/bfp/demos/pi
zn-037 1002% /home/clerk/u76/bfp/apps/mpiexec-0.74/gcc/bin/mpiexec -verbose pi
resolve_exe: found exe "pi" in path
node  0: name = zn-037, mpname = zn-037, cpu = 0
node  1: name = zn-036, mpname = zn-036, cpu = 0
node  2: name = zn-027, mpname = zn-027, cpu = 0
node  3: name = zn-026, mpname = zn-026, cpu = 0
wait_one_task_start: evt = 2, task 0 host zn-037
read_p4_master_port: waiting for port from master
read_p4_master_port: got port 33029
wait_one_task_start: evt = 6, task 3 host zn-026
wait_one_task_start: evt = 4, task 1 host zn-036
wait_one_task_start: evt = 5, task 2 host zn-027
All 4 tasks started.
 Process  0 of  4 is alive
 Process  0 of  4 is alive
Enter the number of intervals: (0 quits)
fmt: end of file
apparent state: unit 5 (unnamed)
last format: (I10)
lately reading sequential formatted external IO
 Process  3 of  4 is alive
 Process  3 of  4 is alive
p3_14696:  p4_error: net_recv read:  probable EOF on socket: 1
 Process  2 of  4 is alive
 Process  2 of  4 is alive
p2_13044:  p4_error: net_recv read:  probable EOF on socket: 1
 Process  1 of  4 is alive
 Process  1 of  4 is alive
p1_11624:  p4_error: net_recv read:  probable EOF on socket: 1
bm_list_11820: (0.263469) net_send: could not write to fd=5, errno = 32
bm_list_11820:  p4_error: net_send write: -1
    p4_error: latest msg from perror: Broken pipe
wait_tasks: numspawned = 4, got evt 3 for tid 2 host zn-037 status 262
p2_13044: (4.019673) net_send: could not write to fd=5, errno = 32
p3_14696: (4.017341) net_send: could not write to fd=5, errno = 32
p1_11624: (4.022792) net_send: could not write to fd=5, errno = 32
wait_tasks: numspawned = 3, got evt 9 for tid 5 host zn-026 status 1
wait_tasks: numspawned = 2, got evt 8 for tid 4 host zn-027 status 1
wait_tasks: numspawned = 1, got evt 7 for tid 3 host zn-036 status 1
mpiexec: Warning: main: task 0 died with signal 6.
mpiexec: Warning: main: task 1 exited with status 1.
mpiexec: Warning: main: task 2 exited with status 1.
mpiexec: Warning: main: task 3 exited with status 1.
zn-037 1003% exit
logout

qsub: job 70731.krypton.rcs.purdue.edu completed
radon 1001% exit

Script done on Thu May 15 11:07:09 2003
>
> If that's not the problem, I'm not so sure.  Run mpiexec with a few "-v"
> flags (3 is lots) to watch exactly what the stdio handler says about
> the input it gets from you and about sending it to the FORTRAN job.
>
> Hopefully you're running your test inside an interactive batch job
> ("qsub -I"), where your shell serves as the input to the job.  Inside a
> non-interactive batch job, you would have to redirect some sort of input
> to the process to get it to read anything, like "echo 27 | mpiexec
> mycode".  This is normal.
>
> 		-- Pete
>



More information about the mpiexec mailing list