mpiexec: Error: stdio_fork: need 1031 sockets

Maestas, Christopher Daniel cdmaest at sandia.gov
Sun Jul 16 12:52:27 EDT 2006


Here's what limits we set in /etc/init.d/pbs_mom on our clusters:

---
ulimit -n 65536
ulimit -u 65536
ulimit -i 4096
ulimit -l 1024
ulimit -s unlimited
---

We  haven't tested the following, but believe you can do the following
in /etc/security/limits.conf
---
*               soft    nproc            32768
*               hard    nproc            32768
*               hard    nofile           32768
*               soft    nofile           32768 
---

-----Original Message-----
From: mpiexec-bounces at osc.edu [mailto:mpiexec-bounces at osc.edu] On Behalf
Of Pete Wyckoff
Sent: Sunday, July 16, 2006 10:49 AM
To: Walid
Cc: raed.alshaikh at gmail.com; mpiexec at osc.edu; hsss991 at gmail.com;
saudiaramco at gmail.com
Subject: Re: mpiexec: Error: stdio_fork: need 1031 sockets

walid.shaari at gmail.com wrote on Sun, 16 Jul 2006 14:51 +0300:
> we are trying to run a 512 cpu job using mpiexec 0.81 on RHEL4 
> update3, using dual rail Myrinet using E cards, and mpichgm 1.2.7..15,

> the gm driver used was 2.1.26_Linux.
> 
> we thought it was an open files limit but increasing the descriptor 
> limits to 4096 did not solve the problem,  if we submit to 508 cpus we

> do not get the error message again

> [user01 at node01 ]$ mpiexec  -v  /red/test.exe > outputl
> mpiexec: resolve_exe: using absolute path "/red/test.exe".
> mpiexec: Error: stdio_fork: need 1031 sockets, only 1024 available.

The code is calling sysconf(_SC_OPEN_MAX) to see what the open file
limit is, 1024 on your machine.  It also calculates how many sockets it
needs to talk to all the processes: 512 * (stdout + stderr) + 6
(listeners and aggregates) + 1 (stdin to task #0) = 1031.

Your 508-cpu job would need only 1023, just under the limit.

> using the -nostdout option did not seem  to help

This is surprising, because then the 512-cpu job should decide it only
needs 7 sockets, regardless of the number of CPUs.

> [user01 at node01 ]$ limit
> cputime      unlimited
> filesize     unlimited
> datasize     unlimited
> stacksize    10240 kbytes
> coredumpsize 0 kbytes
> memoryuse    unlimited
> vmemoryuse   unlimited
> descriptors  4096
> memorylocked 32 kbytes
> maxproc      69632

Unfortunately, csh appears not to report the open file limit.  You can
run bash, then do "ulimit -a" to see the current limits.  I think on Red
Hat systems, you can edit /etc/security/limits.conf to change "nofile"
to be unlimited.  There also may be a need to restart your pbs_mom
processes so that they get the new limit; and/or set "ulimit -n
unlimited" directly in the pbs_mom startup script.  Maybe one of the big
cluster users on this list would know more precisely.

		-- Pete
_______________________________________________
mpiexec mailing list
mpiexec at osc.edu
http://email.osc.edu/mailman/listinfo/mpiexec




More information about the mpiexec mailing list