mpich-shmem problems

Miroslav Ruda ruda at ics.muni.cz
Thu Aug 21 11:23:51 EDT 2003


On Thu, 2003-08-21 at 17:02, Pete Wyckoff wrote:
> Okay, I see.  This patch calls cull_nodes() for the ncpus case as well
> as for the nodect case.  I suspect that "cat $PBS_NODEFILE" will show
> two lines in your batch job, although openpbs with ncpus (and no nodect)
> just has one line.  

Yes, node is repeated twice in nodefile.

>  	if (have_ncpus)
>  	    tasks[0].num_copies = have_ncpus;  /* trust this one first */
> +	    /* note pbspro will set both ncpus and nodect, thus cull below
> +	     * is necessary to prune out extra nodes */
>  	else {
>  	    if (have_nodect != 1)
>  		error("%s: pbs_statjob says nodect = %d,"
>  		  " but shmem only handles nodect = 1", __func__, have_nodect);
>  	    tasks[0].num_copies = numtask;  /* ppn value for the single node */
> -	    cull_nodes(matching_node);  /* discard other cpu tasks */
>  	}
> +	cull_nodes(matching_node);  /* discard other cpu tasks */

Your patch works, but I would suggest to change last line to 

        if (have_nodect) cull_nodes(matching_node);

If only have_ncpus is set, you probably don't want to call cull_nodes.

Best regards.

                  Mirek Ruda



More information about the mpiexec mailing list