Torque 2.1.8 and stdin
Pete Wyckoff
pw at osc.edu
Fri Aug 17 11:46:09 EDT 2007
Matthew.Grismer at wpafb.af.mil wrote on Thu, 16 Aug 2007 16:50 -0400:
> We're using the latest version of Torque on an Xserve cluster running
> Mac OS X Server 10.4.10, and using mpiexec to start jobs. I built
> torque, and then built the latest version of mpiexec from the website
> without any issue (comm.=pmi was specified in the configuration). For
> MPI we are using MPICH2 1.0.5p4. As far as I can tell, stdin is not
> getting passed to the executable. I've attached three files: a sample
> MPI code that demonstrates the issue, the job file I submitted to
> Torque, and the output from Torque. The redirected input file in the job
> file just contained one integer number.
Thanks for working up a test case; that is usually quite helpful.
Unfortunately, it works here, both with PGI and Intel.
> One thing to note is that the following errors in the output:
>
> Error from ioctl = 6
> Error is: : Device not configured
> only occur when I run with the OSC mpiexec. If I use the mpiexec from
> the MPICH2 distribution, I do not get those errors (and the redirected
> input does work as well). Any help would be greatly appreciated!
There are two of those errors in your output, so probably not
related to the single read problem. They appear to come from your
mpich2, which is doing SIOGIFCONF on all the network interfaces.
The startup code in MPI_Init() eventually goes to
MPIDU_CH3U_GetSockInterfaceAddr() to try to lookup the IP address of
the local machine. It should be harmless, but you might want to
complain to the mpich2 developers about the interesting behavior on
OSX.
The mpd startup method does not have this issue because their python
startup scripts always set MPICH_INTERFACE_HOSTNAME. OSC mpiexec
cannot do this (would have to modify TM), and it doesn't make much
sense to do so if you don't need it.
You might try a couple of things to address this error:
> forrtl: severe (24): end-of-file during read, unit -4, file stdin
First, make sure it fails with:
/usr/local/bin/mpiexec -n 1 -verbose a.out < input
too. It will simplify debugging to just run 1 task.
Then, have mpiexec generate all its debugging and we can verify
that it is writing the bytes from the input file:
/usr/local/bin/mpiexec -n 1 -v -v -v a.out < input >& mp.out
Next, the big hammer. Hopefully your macs have strace, then you
see if the process is actually reading the data. Do:
mpiexec -n 1 strace -vFf -s 200 ./a.out < input >& st.out
Send any/all of this output and we'll try to figure it out.
-- Pete
P.S. If you disable HTML in your mailer, the messages will stay
out of people's junk boxes and the archiver will be happier.
More information about the mpiexec
mailing list