mpiexec: Error: stdio_fork: need 1031 sockets
Walid
walid.shaari at gmail.com
Tue Jul 18 03:08:59 EDT 2006
Dear all,
Thanks after setting the limits in pam, and ulimit, reloading the
pbs_moms' solves the problem of need 1031 sockets, only 1024
available.
regards
Walid
On 7/16/06, Walid <walid.shaari at gmail.com> wrote:
> On 7/16/06, Maestas, Christopher Daniel <cdmaest at sandia.gov> wrote:
> > Here's what limits we set in /etc/init.d/pbs_mom on our clusters:
> >
> > ---
> > ulimit -n 65536
> > ulimit -u 65536
> > ulimit -i 4096
> > ulimit -l 1024
> > ulimit -s unlimited
> > ---
> >
> > We haven't tested the following, but believe you can do the following
> > in /etc/security/limits.conf
> > ---
> > * soft nproc 32768
> > * hard nproc 32768
> > * hard nofile 32768
> > * soft nofile 32768
>
> Pete, Christopher,
>
> Thanks for the prompt responses, after reading both of your emails,
> and browsing the list i see that we should test the limit within the
> pbs environment which we did not do, I will try your suggestion again
> tommorow
>
>
> regards
>
> Walid
> > ---
> >
> > -----Original Message-----
> > From: mpiexec-bounces at osc.edu [mailto:mpiexec-bounces at osc.edu] On Behalf
> > Of Pete Wyckoff
> > Sent: Sunday, July 16, 2006 10:49 AM
> > To: Walid
> > Cc: raed.alshaikh at gmail.com; mpiexec at osc.edu; hsss991 at gmail.com;
> > saudiaramco at gmail.com
> > Subject: Re: mpiexec: Error: stdio_fork: need 1031 sockets
> >
> > walid.shaari at gmail.com wrote on Sun, 16 Jul 2006 14:51 +0300:
> > > we are trying to run a 512 cpu job using mpiexec 0.81 on RHEL4
> > > update3, using dual rail Myrinet using E cards, and mpichgm 1.2.7..15,
> >
> > > the gm driver used was 2.1.26_Linux.
> > >
> > > we thought it was an open files limit but increasing the descriptor
> > > limits to 4096 did not solve the problem, if we submit to 508 cpus we
> >
> > > do not get the error message again
> >
> > > [user01 at node01 ]$ mpiexec -v /red/test.exe > outputl
> > > mpiexec: resolve_exe: using absolute path "/red/test.exe".
> > > mpiexec: Error: stdio_fork: need 1031 sockets, only 1024 available.
> >
> > The code is calling sysconf(_SC_OPEN_MAX) to see what the open file
> > limit is, 1024 on your machine. It also calculates how many sockets it
> > needs to talk to all the processes: 512 * (stdout + stderr) + 6
> > (listeners and aggregates) + 1 (stdin to task #0) = 1031.
> >
> > Your 508-cpu job would need only 1023, just under the limit.
> >
> > > using the -nostdout option did not seem to help
> >
> > This is surprising, because then the 512-cpu job should decide it only
> > needs 7 sockets, regardless of the number of CPUs.
> >
> > > [user01 at node01 ]$ limit
> > > cputime unlimited
> > > filesize unlimited
> > > datasize unlimited
> > > stacksize 10240 kbytes
> > > coredumpsize 0 kbytes
> > > memoryuse unlimited
> > > vmemoryuse unlimited
> > > descriptors 4096
> > > memorylocked 32 kbytes
> > > maxproc 69632
> >
> > Unfortunately, csh appears not to report the open file limit. You can
> > run bash, then do "ulimit -a" to see the current limits. I think on Red
> > Hat systems, you can edit /etc/security/limits.conf to change "nofile"
> > to be unlimited. There also may be a need to restart your pbs_mom
> > processes so that they get the new limit; and/or set "ulimit -n
> > unlimited" directly in the pbs_mom startup script. Maybe one of the big
> > cluster users on this list would know more precisely.
> >
> > -- Pete
> > _______________________________________________
> > mpiexec mailing list
> > mpiexec at osc.edu
> > http://email.osc.edu/mailman/listinfo/mpiexec
> >
> >
> >
>
More information about the mpiexec
mailing list