working directory incorrect with tcsh
Pete Wyckoff
pw at osc.edu
Wed Jul 21 20:00:34 EDT 2004
djhale at sandia.gov wrote on Wed, 21 Jul 2004 16:26 -0700:
> OK, the verbose output looks right. /bin/sh is actually /bin/bash. The
> strace is a bit odd, there are places where it chdirs
> to /home/djhale/bin/test, and other places where it chdirs to /home/djhale.
> Attached is the strace for you to examine further.
>
> Here are two more example runs:
>
> [djhale at cn12 test]$ pwd
> /home/djhale/bin/test
> [djhale at cn12 test]$ mpiexec -pernode pwd
> /home/djhale
> /home/djhale
> /home/djhale
>
> # A real run that doesnt work with tcsh
> [djhale at cn12 test]$ mpiexec -n 6 vasp_chain
> ./vasp_chain: Command not found.
> ./vasp_chain: Command not found.
> ./vasp_chain: Command not found.
> ./vasp_chain: Command not found.
> ./vasp_chain: Command not found.
> ./vasp_chain: Command not found.
> mpiexec: Warning: main: task 0 exited with status 1.
> mpiexec: Warning: main: task 1 exited with status 1.
> mpiexec: Warning: main: task 2 exited with status 1.
> mpiexec: Warning: main: task 3 exited with status 1.
> mpiexec: Warning: main: task 4 exited with status 1.
> mpiexec: Warning: main: task 5 exited with status 1.
The strace was too large to go to the list, but I'll pick out the
relevant bits:
execve("/bin/sh", ["/bin/sh", "-c", "if test -d "/home/djhale/bin/test\"; then cd \"/home/djhale/bin/test\"; fi; exec /bin/tcsh -c \'exec pwd\'"] ...);
..
execve("/bin/tcsh", ["/bin/tcsh", "-c", "exec pwd"], ...)
..
open("/etc/profile.d/pbs.csh", O_RDONLY) = 4
..
read(4, "if ( ! $?PBSHOME ) then\n setenv PBSHOME /apps/openpbs\n if ( ! $?PATH ) then\n setenv PATH ${PBSHOME}/bin:${PBSHOME}/sbin\n else\n setenv PATH ${PBSHOME}/bin:${PBSHOME}/sbin:${PATH}\n endif\n if ( ! $?MANPATH ) then\n setenv MANPATH ${PBSHOME}/man\n else\n setenv MANPATH ${PBSHOME}/man:${MANPATH}\n endif\n if ( ! $?LD_LIBRARY_PATH ) then\n setenv LD_LIBRARY_PATH ${PBSHOME}/lib\n else\n setenv LD_LIBRARY_PATH ${PBSHOME}/lib:${LD_LIBRARY_PATH}\n endif\nendif\n\nif ( $?PBS_ENVIRONMENT ) then\n\tcd $PBS_O_WORKDIR\n\tsetenv WCOLL $PBS_NODEFILE\nendif\n", 4096) = 557
..
Your pbs.csh script puts all processes into $PBS_O_WORKDIR. Since tcsh
is fastidious about running init scripts no matter what the environment,
the effort to chdir() in /bin/sh before invoking tcsh is wasted. Bash
doesn't run those scripts unless it knows it has a tty with an
interested user in front of it. While the attempt to relocate users to
their invoking directories is admirable, most sites don't bother to do
that.
We could avoid this problem by invoking the executable directly from
/bin/sh, but we want users to use their native shells if possible
to avoid forcing them to learn sh syntax just for their batch jobs.
I suggest instead that you add some lines to pbs.csh to avoid the "cd"
if mpiexec is trying to run stuff through pbs_mom. Unfortunately there
is not an easy test for this. Since you've got Myrinet, you can test
against GMPI_ID, and avoid changing directory to $HOME if that is
set, but that's not exactly generic. If you convince me to add an env
var such as "MPIEXEC_WAS_HERE", you could test against that in your
pbs.csh file.
Or you could "rm /bin/tcsh" and suffer an entirely different sort
of reaction from users. :)
-- Pete
More information about the mpiexec
mailing list