Trouble with all jobs launched on same node.
james.becnel at srs.gov
james.becnel at srs.gov
Thu Apr 8 10:58:09 EDT 2004
Problem: For example, on a 4 node run, although OpenPBS allocates 4
separate nodes, they all launch on the first node given in the PBS node
list.
Any ideas why? I have been exploring the code to try to find a problem,
but it looks to be where mpiexec gets the information from OpenPBS. Let
me know if you have any thoughts. Maybe I need to be looking at the code
on the OpenPBS side instead? Can you suggest any workarounds? I have no
problems coding in C. Thank you!
OpenPBS qstat output:
---------------------
Req'd Req'd
Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S
Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- -
-----
12938.c00 msi3 ms MS_375NN 12541 4 4 -- -- R
--
c07/0+c02/0+c16/0+c11/0
mpiexec-0.75 task[] settings:
-----------------------------
node 0: name = c07, mpname = c07, cpu = 0
node 1: name = c07, mpname = c07, cpu = 0
node 2: name = c07, mpname = c07, cpu = 0
node 3: name = c07, mpname = c07, cpu = 0
mpiexec-0.75 compile options: --with-comm=shared
--with-default-comm=mpich-p4
Setup:
------
mpiexec-0.75
OpenPBS-2.3.16 (no patches)
MPICH-1.2.5.2
Linux-2.4.18 / RedHat 7.3
--------------------------------
Jim Becnel
Savannah River Technology Center
Bldg. 773-42A, Room 179
Aiken, SC 29808
voice: 803-725-7386
fax: 803-725-8829
email: James.Becnel at srs.gov
--------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://email.osc.edu/pipermail/mpiexec/attachments/20040408/3601baa1/attachment.htm
More information about the mpiexec
mailing list