torque+mpiexec+mvapich = strange behavior
Alex
korobka at nankai.edu.cn
Tue Dec 7 21:08:50 EST 2004
I've seen the same problem about a month ago. I commented out that close(0)
and it worked well afterwards. I think that strace log showed that with non-
interactive jobs fd 0 was being used by read_ib_startup_ports after that
close(0) call.
Cheers,
Alex
ÔÚÄúµÄÀ´ÐÅÖÐÔø¾Ìáµ½:
>From: Pete Wyckoff <pw at osc.edu>
>A.Starikov at utwente.nl wrote on Mon, 06 Dec 2004 02:57 +0100:
> > I'm using torque-1.1.0p4 + mvapich-0.9.4-103 mpiexec-cvs
> > And observe something strange.
> > When I submit interactive job, I can start mpi job without any problem
> > in interactive session.
> > But when I submit non-interactive MPI job, I see:
> > "mpiexec: Error: read_ib_startup_ports: accept iter 0: Invalid argument"
..
> Because of your observation that changing your shell changes the
> behavior and your feelings about the fork :), I'm worried about a
> certain close in stdio.c. The parent does close(0) unconditionally, but
> perhaps this is not correct. Can you add, around stdio.c:331, the two
> debugging printfs below (untested):
>
> if (pid > 0) {
> /* parent: do not listen to stdin but
> * leave 1,2 open for debugging/error output (to pbs batch output files
> * or to tty for interactive)
> */
> printf("%s: pre-close-0 aggregate = %d %d %d\n", __func__,
> aggregate[0], aggregate[1], aggregate[2]);
> printf("%s: abort_fd_in = %d %d\n", __func__,
> abort_fd_in[0], abort_fd_in[1]);
> close(0);
>
> If we see that abort_fd_in[0] == 0, and aggregate[0] == -1, maybe we
> should pay attention to those instead of running straight to close(0).
> Or it could be a completely different problem.
>
> What is your default shell, by the way, if not /bin/bash? Do you
> specify any "-S" lines in your PBS script or on the command-line to
> qsub?
>
> -- Pete
> _______________________________________________
> mpiexec mailing list
> mpiexec at osc.edu
> http://email.osc.edu/mailman/listinfo/mpiexec
>
More information about the mpiexec
mailing list