Passing 3d arrays to a convolution function in C-CodePudding

I need to do a function that executes a 2D convolution and for that I need to pass to it a couple of 3d arrays. However I've been told my method is not an ideal way to do this.

First, I declare the variables:

typedef struct {
    float img[224][224][3];
} input_224_t;

typedef struct {
    float img[112][112][32];
} input_112_t;

typedef struct {
    float img[3][3][32];
} weightsL1_t;

Then, the convolution looks like this:

void convolution(input_224_t* N, weightsL1_t* M, input_112_t* P, int size, int ksize, int channels, int filters, int stride)
{
    // Effectively pads the image before convolution. Technically also works for pointwise, but it's inefficient.
    // find center position of kernel (half of kernel size)
    int kcenter = ksize / 2;

    // Declare output indexes
    int a = 0;
    int b = -1;

    for (int k = 0; k < filters;   k)                   // filters
    {
        for (int i = 0; i < size; i = i   stride)       // rows
        {
            for (int j = 0; j < size; j = j   stride)   // columns
            {
                b  ;
                if (b == ksize) {b=0;a  ;}              // Increment output index
                for (int m = 0; m < ksize;   m)         // kernel rows
                {
                    for (int n = 0; n < ksize;   n)     // kernel columns
                    {
                        // Index of input signal, used for checking boundary
                        int ii = i   (m - kcenter);
                        int jj = j   (n - kcenter);

                        // Ignore input samples which are out of bound
                        if (ii >= 0 && ii < size && jj >= 0 && jj < size) {
                            for (int p = 0; p < channels;   p)  // channels
                            {
                                P.img[a][b][k]  = N.img[ii][jj][p] * M.img[m][n][k];    // convolve
                            }
                        }
                    }
                }
            }
        }
    }
}

(This returns "field 'img' could not be resolved" at the "convolve" line)

I then import the values into the correct structs (which was a previous question of mine which has been answered: Write values to a 3D array inside a struct in C) and I call the function like this:

convolution(test_image, test_filter, test_result, 6, 3, 1, 1, 2);

I have been told in my previous question that this is not an ideal way to handle 3D arrays, and that it may use a lot more memory than I intend. This is a very memory-intensive process, and this will run in an embedded system, so optimizing memory allocation is paramount.

My objective, if possible, is to only allocate one of each of these 3D arrays at any point in time as to not use unnecessary memory, and do it in a way that this space can be freed at a later point.

Thank you in advance.

CodePudding user response：

You could use Variable Length Arrays as function parameters.

void convolve(int isize,  // width/height of input (224)
              int osize,  // width/height of output (112)
              int ksize,  // width/height of kernel (3)
              int stride, // shift between input pixels, between consecutive outputs
              int pad,    // offset between (0,0) pixels between input and output
              int idepth, int odepth, // number of input and output channels
              float idata[isize][isize][idepth],
              float odata[osize][osize][odepth],
              float kdata[idepth][ksize][ksize][odepth])

{
  // iterate over the output
  for (int oy = 0; oy < osize;   oy) {
  for (int ox = 0; ox < osize;   ox) {
  for (int od = 0; od < odepth;   od) {
      odata[oy][ox][od] = 0;
      for (int ky = 0; ky < ksize;   ky) {
      for (int kx = 0; kx < ksize;   kx) {
          // map position in output and kernel to the input
          int iy = stride * oy   ky - pad;
          int ix = stride * ox   kx - pad;
          // use only valid inputs
          if (iy >= 0 && iy < isize && ix >= 0 && ix < isize)
              for (int id = 0; id < idepth;   id)
                  odata[oy][ox][od]  = kdata[id][ky][kx][od] * idata[iy][ix][id];
      }}
  }}}
}

Typical usage would be:

// allocate input
float (*idata)[224][3] = calloc(224, sizeof *idata);
// fill input using idata[y][x][d] syntax

// allocate kernel
float (*kdata)[3][3][32] = calloc(3, sizeof *kdata);
// fill kernel

// allocate output
float (*odata)[112][32] = calloc(112, sizeof *odata);

convolve(224, 112, 3, // input, output, kernel size
         2, // stride
         1, // pad input by one pixel what will center the kernel
         3, 32, // number of input and output channels
         idata, odata, kdata);

// free memory if it is no longer used
free(idata); free(odata); free(kdata);

The multidimentional arrays could be allocated with:

float (*arr)[10][20][30] = malloc(sizeof *arr);

however accessing elements is a bit cumbersome due to syntax (*arr)[i][j][j]. Therefore it is simple to use a pointer to the first element of array and allocate multiple subarrays at this pointer.

float (*arr)[20][30] = malloc(10 * sizeof *arr);

or with calloc() with automated zeroing and avoiding overflows.

float (*arr)[20][30] = calloc(10, sizeof *arr);

BTW. I suggest to reorder dimensions of the kernel to ODEPTH x KSIZE x KSIZE x IDEPTH. This would make iterating over the kernel more cache-friendly.