Home > Software design >  Is it possible for a dilation or erosion algorithm to not use a separate array?
Is it possible for a dilation or erosion algorithm to not use a separate array?

Time:01-16

I have an assignment where I am to write an inline x86 assembly code for a dilation and an erosion function. My problem is that we are not given a separate array and can't touch the non-asm part of the program. So I need to find a way to alter the original image without copying it elsewhere. But if I do so, the process gets compromised because the later pixels take into account the altered values of their neighbouring pixels instead of their original ones and the entire image becomes black or white.

Below is one of the functions. I am putting it so the restrictions I mentioned are clear, not because I expect someone to write it for me. I cannot initialize a separate array inside the asm block and I'm not given one by the rest of the code either.

void dilation(int image_size, int filter_size, short* image_org) {
    __asm {
        MOV EBX, image_org
        //I can only write code in here
    }
}

Edit: I think I'm supposed to push the new value of each pixel to the stack and only after I finish traversing through the whole image place them back inside the array.

CodePudding user response:

You can easily do this for a 3x1 dilation, by keeping the few pixels that you will modify.

Assume the three consecutive pixel values A, B, C. Assume that you already modified A, and are busy dilating for B (i.e. you will replace B by max(A, B, C)). You can use the sequence of operations

SavedA= SavedB
SavedB= B
B= max(SavedA, SavedB, C)

in a loop from left to right. (I leave you the processing of the first and last pixels in the row as an exrecise.)

You can generalize this to 1x3 dilation, and 3x3 dilation by successive application of 3x1 and 1x3. Similar for erosions.

For (2n 1)x1, repeat n times.

This works for grayscale images as well. You can also adapt for packed binary formats.

CodePudding user response:

If input image is monochrome, you can use extra few values to mark pixels with extra meaning, then process the pixels, then clear the marks before exit.

If pixel data is 16 bits and if you use only 1 bit for monochrome, the remaining 15 bits can easily hold all the values like "to be deleted", "to be painted". It would be just about OR operation on the pixel bits and values 2 or 4.

Otherwise you have to change the pixel values before neighbor pixels need them which adds a bias to the algorithm.


If AVX vectors are allowed (on 32 bit environments, only half of number of registers available (see Peter Cordes' comment below)) and pixels are fully colored, you could store 16 pixels (each 16bits) as a tile of 4x4 size on a single AVX register. If AVX instruction set allows use of 16 registers, then it lets 16x16 tile be computed in single CPU core. To compute whole 512x512 image without using any array on RAM, you'd require at least 1024 cores (or threads) in-flight. If threads do not count as an array for the problem, it should work. But the trick would be computation should not write results until all computations are completed. Once all cores write their own tiles, it should be synchronized as complete. But extracting/altering lanes of AVX registers would cost extra performance.

If AVX512 allowed (32bit mode = 1/4 total registers available, though), that would make 32x32 sized tile per core or need just 256 cores/threads.

I don't know if you are also allowed to push values to FPU for extra space (maybe not much but still could hold some border pixels between 2 tiles or something else).


If image is monochrome again, you can apply a RLE to compress it. Once compressed, it should cost much less area in the input array. Then you can use the rest as an output-RLE. On final step, do decompression from there to whole input again. Run Length Encoding is fast but decompressing for every pixel-read will be slow or at least must be too much book-keeping. Still, performance is not the priority here right? Maybe complexity can be reduced by having 1 new RLE per row. Didn't try.

  • Related