mpiexec: Error: stdio_fork: need 1031 sockets
Maestas, Christopher Daniel
cdmaest at sandia.gov
Sun Jul 16 12:52:27 EDT 2006
Here's what limits we set in /etc/init.d/pbs_mom on our clusters:
---
ulimit -n 65536
ulimit -u 65536
ulimit -i 4096
ulimit -l 1024
ulimit -s unlimited
---
We haven't tested the following, but believe you can do the following
in /etc/security/limits.conf
---
* soft nproc 32768
* hard nproc 32768
* hard nofile 32768
* soft nofile 32768
---
-----Original Message-----
From: mpiexec-bounces at osc.edu [mailto:mpiexec-bounces at osc.edu] On Behalf
Of Pete Wyckoff
Sent: Sunday, July 16, 2006 10:49 AM
To: Walid
Cc: raed.alshaikh at gmail.com; mpiexec at osc.edu; hsss991 at gmail.com;
saudiaramco at gmail.com
Subject: Re: mpiexec: Error: stdio_fork: need 1031 sockets
walid.shaari at gmail.com wrote on Sun, 16 Jul 2006 14:51 +0300:
> we are trying to run a 512 cpu job using mpiexec 0.81 on RHEL4
> update3, using dual rail Myrinet using E cards, and mpichgm 1.2.7..15,
> the gm driver used was 2.1.26_Linux.
>
> we thought it was an open files limit but increasing the descriptor
> limits to 4096 did not solve the problem, if we submit to 508 cpus we
> do not get the error message again
> [user01 at node01 ]$ mpiexec -v /red/test.exe > outputl
> mpiexec: resolve_exe: using absolute path "/red/test.exe".
> mpiexec: Error: stdio_fork: need 1031 sockets, only 1024 available.
The code is calling sysconf(_SC_OPEN_MAX) to see what the open file
limit is, 1024 on your machine. It also calculates how many sockets it
needs to talk to all the processes: 512 * (stdout + stderr) + 6
(listeners and aggregates) + 1 (stdin to task #0) = 1031.
Your 508-cpu job would need only 1023, just under the limit.
> using the -nostdout option did not seem to help
This is surprising, because then the 512-cpu job should decide it only
needs 7 sockets, regardless of the number of CPUs.
> [user01 at node01 ]$ limit
> cputime unlimited
> filesize unlimited
> datasize unlimited
> stacksize 10240 kbytes
> coredumpsize 0 kbytes
> memoryuse unlimited
> vmemoryuse unlimited
> descriptors 4096
> memorylocked 32 kbytes
> maxproc 69632
Unfortunately, csh appears not to report the open file limit. You can
run bash, then do "ulimit -a" to see the current limits. I think on Red
Hat systems, you can edit /etc/security/limits.conf to change "nofile"
to be unlimited. There also may be a need to restart your pbs_mom
processes so that they get the new limit; and/or set "ulimit -n
unlimited" directly in the pbs_mom startup script. Maybe one of the big
cluster users on this list would know more precisely.
-- Pete
_______________________________________________
mpiexec mailing list
mpiexec at osc.edu
http://email.osc.edu/mailman/listinfo/mpiexec
More information about the mpiexec
mailing list