MPIEXEC 0.75 + PBSPro ?
Pete Wyckoff
pw at osc.edu
Thu Mar 25 16:00:19 EST 2004
francois at hpce.nec.com said on Thu, 25 Mar 2004 21:24 +0100:
> i just grab the last mpiexec version (0.75) and i was trying to have it
> working with PBS Pro(5.4).
>
> As reported in the mpiexec web page, notes section, PBS Pro seems not to
> work correctly with mpiexec.
>
> I'm getting the following error messages when trying to run a 2 cpus job
> (-l nodes=2:ppn=1)
>
> mpiexec: Warning: get_hosts: ncpus=2 but nodect=2, pretending nodect=1.
> mpiexec: Warning: get_hosts: ncpus=2 but numtask=2, expecting numtask=1.
>
> The job ran only on 1 cpu.
>
> Did somebody already encounter this issue and fix it ?
Yes, sorry, fixed on February 4, but after the 0.75 release. The CVS
version should be fine. I updated the web page and will roll another
release soon. Here's a diff for get_hosts.c you could add by hand.
Let us know if there are any other issues.
-- Pete
-------------- next part --------------
Index: get_hosts.c
===================================================================
RCS file: /cvs/mpiexec/get_hosts.c,v
retrieving revision 1.36
retrieving revision 1.37
diff -u -r1.36 -r1.37
--- get_hosts.c 8 Dec 2003 23:02:34 -0000 1.36
+++ get_hosts.c 4 Feb 2004 14:22:40 -0000 1.37
@@ -1,6 +1,6 @@
/*
* get_hosts.c - read hostnames from pbs, mark which ones we'll use
- * $Id: get_hosts.c,v 1.36 2003/12/08 23:02:34 pw Exp $
+ * $Id: get_hosts.c,v 1.37 2004/02/04 14:22:40 pw Exp $
*
* Copyright (C) 2000-3 Ohio Supercomputer Center.
* Distributed under the GNU Public License Version 2 or later (See LICENSE)
@@ -224,30 +224,40 @@
/* close connection to pbs server */
pbs_disconnect(fd);
+ /*
+ * Various PBSes disagree about what should appear here. Try to do
+ * the best thing.
+ * OpenPBS, non-ts: nodes=20:ppn=2 nodect=20
+ * PBSPro, not-ts: nodes=20:ppn=2 nodect=20 ncpus=40
+ * OpenPBS, ts: ncpus=2
+ * PBSPro, ts: ??
+ */
if (!(have_ncpus || have_nodect))
error("%s: pbs_statjob returned neither \"ncpus\" nor \"nodect\"",
__func__);
if (have_ncpus > 1) {
- task_cntrl_t *oldtasks = tasks;
-
- if (have_nodect > 1)
- warning("%s: ncpus=%d but nodect=%d, pretending nodect=1",
- __func__, have_ncpus, have_nodect);
- if (numtask > 1)
- warning("%s: ncpus=%d but numtask=%d, expecting numtask=1",
- __func__, have_ncpus, numtask);
- /*
- * Explode multi-cpu task entries into 1-cpu ones for better
- * config matching. Later these will be compressed back down
- * for spawning.
- */
- numtask = have_ncpus;
- tasks = Malloc(numtask * sizeof(*tasks));
- for (i=0; i<numtask; i++) {
- memcpy(&tasks[i], &oldtasks[0], sizeof(*tasks));
- tasks[i].name = strsave(oldtasks[0].name);
+ if (cl_args->verbose > 2) {
+ printf("%s: numtask=%d ncpus=%d nodect=%d\n", __func__, numtask,
+ have_ncpus, have_nodect);
+ }
+ if (have_nodect > 1 || numtask > 1) {
+ /* ignore the ncpus setting, trust nodect and exec_host */
+ ;
+ } else {
+ /*
+ * Explode multi-cpu task entries into 1-cpu ones for better
+ * config matching. Later these will be compressed back down
+ * for spawning.
+ */
+ task_cntrl_t *oldtasks = tasks;
+ numtask = have_ncpus;
+ tasks = Malloc(numtask * sizeof(*tasks));
+ for (i=0; i<numtask; i++) {
+ memcpy(&tasks[i], &oldtasks[0], sizeof(*tasks));
+ tasks[i].name = strsave(oldtasks[0].name);
+ }
+ free(oldtasks);
}
- free(oldtasks);
}
/* enforce one process per physical node by strcmp on host name */
@@ -260,9 +270,10 @@
if (cl_args->numproc) {
if (cl_args->numproc > numtask)
error(
- "%s: argument -n specifies %d processors, only %d available%s",
+ "%s: argument -n specifies %d processors, but\n"
+ " only %d available%s",
__func__, cl_args->numproc, numtask,
- cl_args->pernode ? "\n after processing -pernode flag" : "");
+ cl_args->pernode ? " after processing -pernode flag" : "");
/* just take whatever the user specified, which may be fewer,
* discarding the rest of the tasks */
numtask = cl_args->numproc;
More information about the mpiexec
mailing list