Is it possible to suspend/resume mpi jobs with MPIEXEC ?(Through PBS or PBS PRO)

Francois Courteille francois at hpce.nec.com
Thu Mar 18 05:55:32 EST 2004


Hi , Pete,

it's certainly a naive question but i didn't find the answer in
Myricom's,Altair's and OSC's FAQs.

The point for production reasons is to suspend MPI jobs in a Myrinet cluster
and restart them later.

Myricom (see below) pointed out that it could be done through the mpiexec
interface.

The questions i have is : ( I assume that PBS or PBS pro is sending a SIGSTP
signal to mpirun/mpiexec)

*SIGSTOP is uncaughtable , then how will mpiexec behave ?

*Assuming the above issue is fixed how mpiexec will manage to
freeze(suspend) all mpi processes,
  taking care of that all MPI traffic has been succesfully completed ?

with best regards,

Francois Courteille





----- Original Message ----- 
From: "Myricom Technical Support" <help at myri.com>
To: "Francois Courteille" <francois at hpce.nec.com>
Cc: "Myricom Technical Support" <help at myri.com>
Sent: Thursday, March 18, 2004 2:15 AM
Subject: Re: [Myricom help #23813] MPICH-GM + PBS PRO -
Suspend/resumefunction (fwd)


> Hi,
>
> Your question was forwarded to Myricom Technical Support, help at myri.com.
>
> > Could it be possible to use the suspend/resume function of PBS PRO with
mpich-gm MPI jobs ?
> >
> > What are the plans regarding this jobs management topic ?
>
> I have consulted with one of our developers and he replied:
>
> We currently do not support this, but it might be possible to have it
> work with mpiexec.
>
> I am not that familiar with PBS PRO, but I assume that the
> suspend/resume functionality does not do checkpointing, and  just relies
> on sending SIGSTOP/SIGCONT  to every process. In this case, it might be
> possible to make it work with mpiexec
> (http://www.osc.edu/~pw/mpiexec/).
>
> Hope this helps.
>
> Susan
>
> -- 
>
> ------------------------------------------
> Susan Blackford
> Member of Technical Staff
> Myricom Inc.
> ------------------------------------------
>
>
>




More information about the mpiexec mailing list