Mpiexec fails

Pete Wyckoff pw at osc.edu
Fri Mar 14 10:48:12 EDT 2008


Jaime.E.Combariza at Dartmouth.EDU wrote on Thu, 13 Mar 2008 19:03 -0400:
> I've tried so many thing that I am probably getting confused.
>
> fact: the code runs fine when I use mpirun_ssh (from topspin in this case), 
> both for 64x64 and 128x128. That's why I think this is mpiexec related!
>
> It runs fine using mpiexec when the size of the matrix is 64x64.
>
> It aborts using mpiexec when the size of the matrix is 128x128. using -v -v 
> -v did not provide much more information (usable).
>
> The problems with the open read-only are fixed. I am getting similar 
> results with mpiexec 0.83
>
> mpiexec: process_obit_event: evt 87 task 22 on compute-3-26.local stat 267.
> mpiexec: kill_stdio: sent SIGTERM, waiting on 7284.
> mpiexec: goodbye_from_parent: got signal 15, exiting now.
> mpiexec: Warning: tasks 0-63 died with signal 11 (Segmentation fault).

Please always CC the list, and don't top-post.

One thing to think about is how the code is started.  With mpiexec,
processes are spawned directly by pbs, while with mpirun, processes
are spawned through a remote shell on the machine.  Thus if you have
anything in .bashrc or .tcshrc or /etc/bashrc or /etc/profile/* that
would change process parameters, it won't apply to the mpiexec case.

Maybe your pbs is running with data or other limits.  Try restarting
the PBS moms after putting something like this in the
/etc/init.d/pbs_mom startup file:

        # discard any user limits before starting
        ulimit -s unlimited
        ulimit -d unlimited
        ulimit -l unlimited
        echo -n "Starting PBS mom: "
        daemon pbs_mom

You can also try running with strace to figure out exactly why they
are all dying:

    mpiexec -v strace -vFf -s 200 mycode myargs

Look at the last few lines to see why they died with SEGV, maybe.

		-- Pete


More information about the mpiexec mailing list