working directory incorrect with tcsh

djhale djhale at sandia.gov
Thu Jul 22 14:28:17 EDT 2004


See below.

On Wednesday 21 July 2004 05:00 pm, Pete Wyckoff wrote:
> djhale at sandia.gov wrote on Wed, 21 Jul 2004 16:26 -0700:
> > OK, the verbose output looks right.  /bin/sh is actually /bin/bash.  The
> > strace is a bit odd, there are places where it chdirs
> > to /home/djhale/bin/test, and other places where it chdirs to
> > /home/djhale. Attached is the strace for you to examine further.
> >
> > Here are two more example runs:
> >
> > [djhale at cn12 test]$ pwd
> > /home/djhale/bin/test
> > [djhale at cn12 test]$ mpiexec -pernode pwd
> > /home/djhale
> > /home/djhale
> > /home/djhale
> >
> > # A real run that doesnt work with tcsh
> > [djhale at cn12 test]$ mpiexec -n 6 vasp_chain
> > ./vasp_chain: Command not found.
> > ./vasp_chain: Command not found.
> > ./vasp_chain: Command not found.
> > ./vasp_chain: Command not found.
> > ./vasp_chain: Command not found.
> > ./vasp_chain: Command not found.
> > mpiexec: Warning: main: task 0 exited with status 1.
> > mpiexec: Warning: main: task 1 exited with status 1.
> > mpiexec: Warning: main: task 2 exited with status 1.
> > mpiexec: Warning: main: task 3 exited with status 1.
> > mpiexec: Warning: main: task 4 exited with status 1.
> > mpiexec: Warning: main: task 5 exited with status 1.
>
> The strace was too large to go to the list, but I'll pick out the
> relevant bits:

Sorry about that, I thought about picking out parts, then I thought I might 
miss something that would help you see whats going on.

> execve("/bin/sh", ["/bin/sh", "-c", "if test -d "/home/djhale/bin/test\";
> then cd \"/home/djhale/bin/test\"; fi; exec /bin/tcsh -c \'exec pwd\'"]
> ...); ..
> execve("/bin/tcsh", ["/bin/tcsh", "-c", "exec pwd"], ...)
> ..
> open("/etc/profile.d/pbs.csh", O_RDONLY) = 4
> ..
> read(4, "if ( ! $?PBSHOME ) then\n  setenv PBSHOME /apps/openpbs\n  if ( !
> $?PATH ) then\n    setenv PATH ${PBSHOME}/bin:${PBSHOME}/sbin\n  else\n   
> setenv PATH ${PBSHOME}/bin:${PBSHOME}/sbin:${PATH}\n  endif\n  if ( !
> $?MANPATH ) then\n    setenv MANPATH ${PBSHOME}/man\n  else\n    setenv
> MANPATH ${PBSHOME}/man:${MANPATH}\n  endif\n  if ( ! $?LD_LIBRARY_PATH )
> then\n    setenv LD_LIBRARY_PATH ${PBSHOME}/lib\n  else\n    setenv
> LD_LIBRARY_PATH ${PBSHOME}/lib:${LD_LIBRARY_PATH}\n  endif\nendif\n\nif (
> $?PBS_ENVIRONMENT ) then\n\tcd $PBS_O_WORKDIR\n\tsetenv WCOLL
> $PBS_NODEFILE\nendif\n", 4096) = 557 ..
>
> Your pbs.csh script puts all processes into $PBS_O_WORKDIR.  Since tcsh
> is fastidious about running init scripts no matter what the environment,
> the effort to chdir() in /bin/sh before invoking tcsh is wasted.   Bash
> doesn't run those scripts unless it knows it has a tty with an
> interested user in front of it.  While the attempt to relocate users to
> their invoking directories is admirable, most sites don't bother to do
> that.

Right, so when you do a qsub you get logged into the mother node and all the 
profile scripts are executed.  Then when you run mpiexec it calls tcsh that 
in turn executes all the profile scripts again, putting you back where you 
started.

Our users have gotten used to being relocated to where they ran qsub from, so 
we shouldn't take that away.

> We could avoid this problem by invoking the executable directly from
> /bin/sh, but we want users to use their native shells if possible
> to avoid forcing them to learn sh syntax just for their batch jobs.
>
> I suggest instead that you add some lines to pbs.csh to avoid the "cd"
> if mpiexec is trying to run stuff through pbs_mom.  Unfortunately there
> is not an easy test for this.  Since you've got Myrinet, you can test
> against GMPI_ID, and avoid changing directory to $HOME if that is
> set, but that's not exactly generic.  If you convince me to add an env
> var such as "MPIEXEC_WAS_HERE", you could test against that in your
> pbs.csh file.

I'm doing the test against GMPI_ID as you suggested and that fixed the 
problem.  An MPIEXEC_WAS_HERE env var would be nice.  If you ever add that we 
can just replace the test for GMPI_ID with MPIEXEC_WAS_HERE.  Thanks so much 
for your help.

> Or you could "rm /bin/tcsh" and suffer an entirely different sort
> of reaction from users.  :)

I dont think I want to go here =)

> 		-- Pete



More information about the mpiexec mailing list