mpiexec and interactive runs not via PBS

Anton Starikov A.Starikov at utwente.nl
Mon Apr 25 16:15:09 EDT 2005


Hi!

I need one advice. Probably this is only partly related to MPIEXEC, but 
I guess a lot of guys in this list admin clusters :)

I guess that next situation is very common.

I have cluster which is driven by torque. But small part of cluster is 
used for code writing/debugging. And I want to have option to run MPI 
tasks on this part without submitting jobs to resourse manager.

1) Simples sollution is to use two "mpirun" wrappers. One around MPIEXEC 
and one around something different, for example rsh. And this nodes is 
excluded from torque configuration.
But in my case of diskless nodes, it means that I should have two 
scripts with different names. I want to avoid this.
I would like to have one and the same command "mpirun", which users will 
use in any way, so I coming to the next sollution

2) I have to make some check in this wrapper and call proper 
launcher...but...
I would like also to limit execution time for this MPI tasks for 1 hour, 
for example...so I'm going to 3rd sollution.

3) when I run MPI task interactively on masternode - it should be 
automatically registered in torque/PBS, scheduled for immediate 
execution, and started on this separate "interactive" partition even if 
this partition is occupied with another MPI tasks already. (they can 
share the same nodes, that's anyway for debugging), and special "cput" 
limit should be setted up for this task.
Lanch of processes should be performed by MPIEXEC.

Can somebody give me some ideas about organizing this in the 
ideologically right  way?

Basically, I guess this can be done in different way. Any new ssh 
session to the master should be an interactive job with some default 
"cput" limit and some default "PBS_NODEFILE" which is include all nodes 
in interactive partition.
But I don't have good idea about how to organize it. (In my mind coming 
only one idea to set as a default shell for users some wrapper around 
"qsub -I")

And one more thing, reallization of this idea requires that MPIEXEC 
should be able to launch, for example, 6 processes on 4 avalable nodes 
for one MPI task. More processes than available nodes.
Now maximal number of processes which MPIEXEC can launch is limited by 
limit of nodes for job in PBS. But, I guess in some sitations can be 
really helpfull if MPIEXEC will launch so many processes as user want, 
but launch all of them on nodes, which PBS gives for this job.
In principal, ths can be helpfull also in a case, when you have some MPI 
code written in clent-server fashion. When server is doing almost 
nothing, just say to slaves what to do, and slaves perform all real 
calculations.
With current implementation you have to ask one more node for such jobs 
and this node will do almost nothing all the time, or one process should 
   do both, master job and slave job, which is more complicated to write.

Anton Starikov.


More information about the mpiexec mailing list