Implement a function that blends two colors encoded with RGB565 using Alpha blending-CodePudding

I am trying to implement a function that blends two colors encoded with RGB565 using Alpha blending Crgb565 = (1-a)Argb565 a*Brgb565

Where a is the alpha parameter, and the alpha blending value of 0.0-1.0 is mapped to an unsigned char value on the range 0-32. we can choose to use a five bit representation for a instead, thus restricting it to the range of 0-31 (effectively mapping to an alpha blending value of 0.0-0.96875).

Following code I am trying to implement, can you please suggest better way wrt less temp variable , memory optimization (number of multiplications and required memory accesses ),Is my logic for alpha bending is correct? I am not getting correct result/expected output, Seems like I am missing something, please review the code, Every suggest is appreciated, have some doubt based on alpha parameter. I have put my doubts in code comment section. Is there any way to shortening the alpha blending equations(division operation)?

=====================================================

unsigned short blend_rgb565(unsigned short A, unsigned short B, unsigned char Alpha) 
    { 
        unsigned short res = 0; 
        // Alpha converted from [0..255] to [0..31] (8 bit to 5 bit)        
/* I want the alpha parameter (0-32), do i need to add something in Alpha  before right shift?? */
        Alpha = Alpha >> 3;  
    
        // Split Image A into  R, G, B components
        /*Do I need to take it as unsigned short or uint8_t also work fine ??*/
        unsigned short A_r =  A >> 11;
        unsigned short A_g = (A >> 5) & ((1u << 6) - 1); // ((1u << 6) - 1) --> 00000000 00111111
        unsigned short A_b =  A & ((1u << 5) - 1);       //  ((1u << 5) - 1) --> 00000000 00011111
    
        // Split Image B into R, G, B  components
        unsigned short B_r = B >> 11;
        unsigned short B_g = (B >> 5) & ((1u << 6) - 1);
        unsigned short B_b = B & ((1u << 5) - 1);
    
        // Alpha blend components 
        /*Do I need to use 255(8 bit) instead of 32(5 bit), Why we are dividing by it , I have taken the ref from internet , but need little bit more clarification ??*/
        unsigned short uiC_r = (A_r * Alpha   B_r * (32 - Alpha)) / 32;
        unsigned short uiC_g = (A_g * Alpha   B_g * (32 - Alpha)) / 32;
        unsigned short uiC_b = (A_b * Alpha   B_b * (32 - Alpha)) / 32;
    
        // Pack result
        res= (unsigned short) ((uiC_r << 11) | (uiC_g << 5) | uiC_b);
    
     return res; 
    }

CodePudding user response：

It's possible to reduce the multiplies from 6 to 2 if you space out the RGB values into 2 32-bit integers before multiplying:

unsigned short blend_rgb565(unsigned short A, unsigned short B, unsigned char Alpha) 
{ 
    unsigned short res = 0; 
    // Alpha converted from [0..255] to [0..31] (8 bit to 5 bit)        
    Alpha = Alpha >> 3;
    // Alpha = (Alpha   (Alpha >> 5)) >> 3; // map from 0-255 to 0-32 (if Alpha is unsigned short or larger)
    
    // Space out A and B from RRRRRGGGGGGBBBBB to 00000RRRRR00000GGGGGG00000BBBBB
    
    // 31 = 11111 binary
    // 63 = 111111 binary
    unsigned int A32 = (unsigned int)A;
    unsigned int A_spaced = A32 & 31; // B
    A_spaced |= (A32 & (63 << 5)) << 5; // G
    A_spaced |= (A32 & (31 << 11)) << 11; // R
    
    unsigned int B32 = (unsigned int)B;
    unsigned int B_spaced = B32 & 31; // B
    B_spaced |= (B32 & (63 << 5)) << 5; // G
    B_spaced |= (B32 & (31 << 11)) << 11; // R
    
    // multiply and add the alpha to give a result RRRRRrrrrrGGGGGGgggggBBBBBbbbbb,
    // where RGB are the most significant bits we want to keep
    unsigned int C_spaced = (A_spaced * Alpha)   (B_spaced * (32 - Alpha));
    
    // remap back to RRRRRGGGGGBBBBB
    res = (unsigned short)(((C_spaced >> 5) & 31)   ((C_spaced >> 10) & (63 << 5))   ((C_spaced >> 16) & (31 << 11)));
    
    return res; 
}

You need to profile this to see if it is faster, it assumes that multiplications you save are slower than the extra bit-manipulations you replace them with.

CodePudding user response：

can you please suggest better way wrt less temp variable

There is no advantage to remove temporary variables from the implementation. When you compile with optimizations turned on (e.g. -O2 ro /O2) those temp variables will get optimized away.

Two adjustments I would make to your code:

Use uint16_t instead of unsigned short. For most platforms, it won't matter since sizeof(uint16_t)==sizeof(unsigned short), but it helps to be definitive.
No point in converting alpha from an 8-bit value to a 5-bit value. You'll get better accuracy with blending if you let alpha have the full range
Some of your bit-shifting looks weird. It might work. But I use a simpler approach.

Here's an adjustment to your implementation:


#define MAKE_RGB565(r, g, b) ((r << 11) | (g << 5) | (b))

uint16_t blend_rgb565(uint16_t a, uint16_t b, uint8_t Alpha)
{
    const uint8_t invAlpha = 255 - Alpha;

    uint16_t A_r = a >> 11;
    uint16_t A_g = (a >> 5) & 0x3f;
    uint16_t A_b = a & 0x1f;

    uint16_t B_r = b >> 11;
    uint16_t B_g = (b >> 5) & 0x3f;
    uint16_t B_b = b & 0x1f;

    uint32_t C_r = (A_r * invAlpha   B_r * Alpha) / 255;
    uint32_t C_g = (A_g * invAlpha   B_g * Alpha) / 255;
    uint32_t C_b = (A_b * invAlpha   B_b * Alpha) / 255;

    return MAKE_RGB565(C_r, C_g, C_b);
}

But the bigger issue is that this function works on exactly one one pair of pixel colors. If you are invoking this function across an entire image or pair of images, the overhead of using the function call is going to be a major performance issue - even with compiler optimizations and inlining. So if you are calling this function row x col times, you should probably manually inline the code into your loop that is enumerating over every pixel on an image (or pair of images).