mpiexec GMPI_SLAVE env.t problem
Pete Wyckoff
pw at osc.edu
Wed Aug 13 17:37:18 EDT 2003
brh at unimelb.edu.au said on Wed, 13 Aug 2003 14:07 +1000:
> So anyway, again, if I manually run "hello" via mpirun for 4*2 proc.s, as
> can be (partially) seen
> from this extract, mpirun assigns node040 as the master, and nodes 001-004
> as the slaves, and gets
> 192.168.xx.xx addresses for GMPI_SLAVE.
Here's what I suspect, although I am still a bit confused. Maybe you
can try another test. The detailed output you sent definitely helps.
Try this and you'll see both the environment variables that mpiexec
thinks it is sending to the remote process as well as those actually
seen by the process:
mpiexec -v -v -v -np 1 env
You should see lines like:
environment to 0/1
env 0 GMPI_MAGIC=393458
env 1 GMPI_MASTER=amd103
env 2 GMPI_PORT=40120
env 3 GMPI_PORT1=40120
env 4 GMPI_PORT2=40121
env 5 GMPI_NP=1
env 6 GMPI_BOARD=-1
env 7 MPIEXEC_STDOUT_PORT=40123
env 8 MPIEXEC_STDERR_PORT=40124
..
env 74 P4_GLOBMEMSIZE=268435456
env 75 PBS_O_WORKDIR=/usr/local/src/mpich/mpich-1.2.5..10
env 76 LC_COLLATE=C
env 77 _=/usr/local/bin/mpiexec
env 78 GMPI_ID=0
env 79 GMPI_SLAVE=amd103
env 80 MPIEXEC_STDIN_PORT=40122
These are what mpiexec is setting. Then there will be the output of the
command /usr/bin/env which should have almost exactly the same settings
although perhaps in quite a different order due to hashing by the shell.
Even though env is not an MPI program, mpiexec will try to start it up
as one with all the environment variables needed to do so. (Use
"--comm=gm" if that is not your default.)
My conjecture is that there is somewhere a limit on environment size and
the variables toward the end are getting cut off, perhaps in tcsh or
perhaps in pbs_mom, although I do not see any limits now in the latter.
Can you compare the two lists and see if this might be true? Hopefully
identify which is the last env# item to get sent properly, and if it was
truncated?
One possible fix would be for you to reduce either the number of vars in
your environment, or the length if there are a couple of really long
ones. It would be easy enough for me to rearrange the assignment of env
vars in mpiexec to move GMPI_SLAVE and related ones to the top so they
do not get cut off, but I worry that you could be losing other variables
not related to mpiexec. If you do discover some fundamental limit, we
can at least warn other users while rearranging the variables anyway.
Of course, this is all just guesswork. Let me know what you find.
-- Pete
More information about the mpiexec
mailing list