mpiexec: Warning: main: task 0 died with signal 11 (raw 0x10b).

Parvath Reddy parvath_r at yahoo.com
Mon Dec 1 18:58:13 EST 2003


Its quite possible that there's error in my code. Troy
have a look at the code.I have reduced it to bare
minimum.
I was able to run the code on LAM MPI on a linux
network.But ended with signal 11 on the cluster.

---------------------------------------------------
#include <stdio.h>
#include <mpi.h>

#define NRA 362
#define NCA 362
#define NRB 362
#define NCB 362

MPI_Status status;

main(int argc, char **argv)
{
        int numtasks,taskid,i, j,count;
        double a[NRA][NCA],b[NCA][NCB];

        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &taskid);   /*
to get my taskid */
        MPI_Comm_size(MPI_COMM_WORLD, &numtasks); /*
to get number of tasks */

        if (taskid == 0)
        {
                for (i=0; i<NRA; i++)
                        for (j=0; j<NCA; j++)
                                a[i][j]= 1.1;
                count = NRA*NCA;
                MPI_Send(&a[0][0], count, MPI_DOUBLE,
1, 1, MPI_COMM_WORLD);
                printf("Sent\n");
         }

        if (taskid == 1)
        {
                count = NRB*NCB;
                MPI_Recv(&b, count, MPI_DOUBLE, 0, 1,
MPI_COMM_WORLD, &status);
                printf("recieved\n");
        }
        MPI_Finalize();
        exit(0);
}

----------------------------------------------------
pbs script file
---------------
#PBS -N mytest
#PBS -l pmem=2mb
#PBS -l cput=0:10:00
#PBS -l nodes=2:ppn=2
#PBS -l walltime=0:10:00
#PBS -j oe
mpicc -o mess mess.c
mpiexec mess
----------------------------------------------------

--- Troy Baer <troy at osc.edu> wrote:
> On Mon, 1 Dec 2003, Parvath Reddy wrote:
> > mpiexec seems to be failing when huge chunk of
> data is sent by using
> > a MPI_Send. The program works fine when a matrix
> A[361][361] is sent
> > across but it gives the following error when a
> matrix A[362][362] is used
> >  
> > mpiexec: Warning: main: task 0 died with signal 11
> (raw 0x10b).
> >  
> > Is the a method of testing how much data can be
> sent across on command
> > line ? is mpptest the solution if so how do I use
> it?
> 
> This is almost certainly related to your code rather
> than mpiexec.
> Signal 11 is a segmentation fault, which is
> typically caused by
> accessing an array outside its bounds.
> 
> This may be a stupid question, but are you
> absolutely sure that
> your send and receive buffers are the right size? 
> Are you able to
> run this program on another system using a different
> MPI
> implementation?  I ask because it's been my
> experience that when
> an MPI program dies with a seg fault, 99 times out
> of 100 it's caused
> by a bug in the program rather than a problem with
> mpiexec or the
> MPI implementation.
> 
> 	--Troy
> -- 
> Troy Baer                       email:  troy at osc.edu
> Science & Technology Support    phone:  614-292-9701
> Ohio Supercomputer Center       web: 
> http://oscinfo.osc.edu
> 


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/



More information about the mpiexec mailing list