[HELP Request] Mpiexec/PBSPro stack problem

Brent M. Clements bclem at rice.edu
Mon Oct 4 00:55:08 EDT 2004


Hi Guys,
 We are running MPIEXEC 0.76 with PBS 5.4.1 on itanium. We are
experiencing a very very wierd issue when it comes to the stack being set.

On our nodes we have set an unlimited stack via both the bash systemwide
scripts and the tcsh systemwide scripts. But as you can see by the output
below, when we execute ulimit via an mpiexec call, the stack is 8M but if
we do the a normal ulimit -s on the node, it's unlimited.



master5:~> qsub -I -l nodes=n122
qsub: waiting for job 10252.management to start
qsub: job 10252.management ready

Sun Oct  3 23:16:33 CDT 2004
n122:~> mpiexec -comm none bash -c \"hostname \; ulimit -s\"
n122.rtc
8192
n122:~> ulimit -s
unlimited
n122:~>

We have set ulimit -s unlimited in every single place on this node that
could possibly be called by pbs, mpiexec, and the shell intepreters but no
luck there either.

What is the explanation for this?

Btw, we also ran using mpirun and the problem does not occur, so our users
are starting to blame mpiexec, I just want to get your opinion and
expertise before we start trying to explain the phenomenon to our users.


Thanks,
Brent
Brent Clements
Linux Technology Specialist
Information Technology
Rice University

Linux at Rice news and information
available only at http://linuxsupport.rice.edu


On Fri, 5 Dec 2003, Pete Wyckoff wrote:

> Changes from the previous version are quite extensive, as it has been
> seven months since the last release.  If any of the following topics
> interest you, please give the new version a try.
>
>
> New communication library:  MPICH on InfiniBand
>
>     InfiniBand is a high-speed interconnect that is becoming popular in
>     the message passing world.  The most popular implementation of MPI
>     is the one from OSU/CIS based on MPICH and supported in this release
>     of mpiexec.
>
> PBS Mom restart
>
>     Add support to reconnect to PBS moms which are restarted during the
>     run of a parallel application.  This requires changes to PBS to work
>     properly which are included in a new patch to OpenPBS found in this
>     mpiexec distribution.
>
>     It is a somewhat complex patch which fixes numerous crashes in the
>     PBS code itself.  This support is still marked experimental, but
>     give it a shot if you are interested in the ability to restart moms
>     under running parallel processes.
>
> MPIEXEC_RANK environment variable
>
>     Many users take advantage of the "none" communication library to
>     automate system tasks in the context of a PBS job.  There is now an
>     environment variable which gives a different number to each task
>     similar to the rank in an MPI implementation.  Thanks to Jose Luis
>     Gordillo Ruiz and Eduardo Murrieta Leon for the idea and patch.
>
> MPICH/P4 debugging fix
>
>     Process arguments were moved around to allow debugging inside an
>     xterm.  This always had worked for the other communication
>     libraries.  Try "mpiexec xterm -e gdb --args mycode" to see it in
>     action.
>
> MPICH/P4 shmem command-line flag
>
>     In the ongoing saga of the "--with-comm=shared" compile-time flag
>     for the MPICH/P4 libmpich.a library itself, this adds a bit more
>     flexibility.  Now mpiexec allows runtime specification of the
>     shared-memory support of your mpich library.  You will be much
>     happier if you compile mpiexec so that it knows if your mpich/p4
>     library uses --comm=shmem or not, but this flag is handy in testing,
>     or at sites which must support both types.
>
> MPICH/GM GM2 bug fix
>
>     Chris Maestas of Sandia found and fixed a problem when using mpiexec
>     with MPICH/GM with the GM2 library.  This GM2 library is a major
>     version change from the GM library most use now to drive their
>     Myricom devices, and thus not yet widely adopted.
>
>
> Full changelog and downloads at:  http://www.osc.edu/~pw/mpiexec/
> Respond to the list with bug reports, comments, suggestions,
> and complaints.
>
> 		-- Pete
> _______________________________________________
> mpiexec mailing list
> mpiexec at osc.edu
> http://email.osc.edu/mailman/listinfo/mpiexec
>



More information about the mpiexec mailing list