[HELP Request] Mpiexec/PBSPro stack problem

Alex korobka at nankai.edu.cn
Mon Oct 4 05:21:29 EDT 2004


Have you tried setting up ulimited stack from the pbs init script? My guess
would be that with mpiexec the job is actually started by pbs_mom via
fork/exec so the limits on the parent process are in effect. 

In any case, you can work around this by using a wrapper script to start up 
the application,

#!/bin/bash

export APPDIR=/home/admin/bench/hpl1a/bin/VMI
export LD_LIBRARY_PATH=/usr/local/vmi-2.0.1-1/intel/lib
export LD_ASSUME_KERNEL=2.2.5

ulimit -c unlimited

${APPDIR}/xhpl $@

Regards,
Alex Korobka

ÔÚÄúµÄÀ´ÐÅÖÐÔø¾­Ìáµ½:
>From: "Brent M. Clements" <bclem at rice.edu>
>Reply-To: 
>To: Pete Wyckoff <pw at osc.edu>
>Subject: [HELP Request]  Mpiexec/PBSPro stack problem
>Date:Sun, 3 Oct 2004 23:55:08 -0500 (CDT)
>
>Hi Guys,
>  We are running MPIEXEC 0.76 with PBS 5.4.1 on itanium. We are
> experiencing a very very wierd issue when it comes to the stack being set.
> 
> On our nodes we have set an unlimited stack via both the bash systemwide
> scripts and the tcsh systemwide scripts. But as you can see by the output
> below, when we execute ulimit via an mpiexec call, the stack is 8M but if
> we do the a normal ulimit -s on the node, it's unlimited.
> 
> 
> 
> master5:~> qsub -I -l nodes=n122
> qsub: waiting for job 10252.management to start
> qsub: job 10252.management ready
> 
> Sun Oct  3 23:16:33 CDT 2004
> n122:~> mpiexec -comm none bash -c \"hostname \; ulimit -s\"
> n122.rtc
> 8192
> n122:~> ulimit -s
> unlimited
> n122:~>
> 
> We have set ulimit -s unlimited in every single place on this node that
> could possibly be called by pbs, mpiexec, and the shell intepreters but no
> luck there either.
> 
> What is the explanation for this?
> 
> Btw, we also ran using mpirun and the problem does not occur, so our users
> are starting to blame mpiexec, I just want to get your opinion and
> expertise before we start trying to explain the phenomenon to our users.
> 
> 
> Thanks,
> Brent
> Brent Clements
> Linux Technology Specialist
> Information Technology
> Rice University
> 
> Linux at Rice news and information
> available only at http://linuxsupport.rice.edu
> 
> 
> On Fri, 5 Dec 2003, Pete Wyckoff wrote:
> 
> > Changes from the previous version are quite extensive, as it has been
> > seven months since the last release.  If any of the following topics
> > interest you, please give the new version a try.
> >
> >
> > New communication library:  MPICH on InfiniBand
> >
> >     InfiniBand is a high-speed interconnect that is becoming popular in
> >     the message passing world.  The most popular implementation of MPI
> >     is the one from OSU/CIS based on MPICH and supported in this release
> >     of mpiexec.
> >
> > PBS Mom restart
> >
> >     Add support to reconnect to PBS moms which are restarted during the
> >     run of a parallel application.  This requires changes to PBS to work
> >     properly which are included in a new patch to OpenPBS found in this
> >     mpiexec distribution.
> >
> >     It is a somewhat complex patch which fixes numerous crashes in the
> >     PBS code itself.  This support is still marked experimental, but
> >     give it a shot if you are interested in the ability to restart moms
> >     under running parallel processes.
> >
> > MPIEXEC_RANK environment variable
> >
> >     Many users take advantage of the "none" communication library to
> >     automate system tasks in the context of a PBS job.  There is now an
> >     environment variable which gives a different number to each task
> >     similar to the rank in an MPI implementation.  Thanks to Jose Luis
> >     Gordillo Ruiz and Eduardo Murrieta Leon for the idea and patch.
> >
> > MPICH/P4 debugging fix
> >
> >     Process arguments were moved around to allow debugging inside an
> >     xterm.  This always had worked for the other communication
> >     libraries.  Try "mpiexec xterm -e gdb --args mycode" to see it in
> >     action.
> >
> > MPICH/P4 shmem command-line flag
> >
> >     In the ongoing saga of the "--with-comm=shared" compile-time flag
> >     for the MPICH/P4 libmpich.a library itself, this adds a bit more
> >     flexibility.  Now mpiexec allows runtime specification of the
> >     shared-memory support of your mpich library.  You will be much
> >     happier if you compile mpiexec so that it knows if your mpich/p4
> >     library uses --comm=shmem or not, but this flag is handy in testing,
> >     or at sites which must support both types.
> >
> > MPICH/GM GM2 bug fix
> >
> >     Chris Maestas of Sandia found and fixed a problem when using mpiexec
> >     with MPICH/GM with the GM2 library.  This GM2 library is a major
> >     version change from the GM library most use now to drive their
> >     Myricom devices, and thus not yet widely adopted.
> >
> >
> > Full changelog and downloads at:  http://www.osc.edu/~pw/mpiexec/
> > Respond to the list with bug reports, comments, suggestions,
> > and complaints.
> >
> > 		-- Pete
> > _______________________________________________
> > mpiexec mailing list
> > mpiexec at osc.edu
> > http://email.osc.edu/mailman/listinfo/mpiexec
> >
> _______________________________________________
> mpiexec mailing list
> mpiexec at osc.edu
> http://email.osc.edu/mailman/listinfo/mpiexec
>





More information about the mpiexec mailing list