Best way to implement bit shifting over a block of memory-CodePudding

I'd like to bit shift a pointed block of memory of arbitrary size. I thought about working on the problem char by char and I found out this solution that works.

void rshift(void *self, int shift, size_t size) {
    char *s = self;
    bool take, pass;
    for (int i = 0; i < shift; i  ) {
        take = false;
        for (int j = 0; j < size; j  , take = pass) {
            pass = *(s   j) & 1;
            *(s   j) >>= 1;
            *(s   j) ^= (-take ^ *(s   j)) & (1 << (CHAR_BIT - 1));
        }
    }
}

Where self is the memory block to shift, size is its size in chars and shift is how many bits to shift. Essentially what I'm doing is, for each byte I shift it by one passing the bit that has been lost to the next byte.

However this algorithm is quite terrible, I think it's like a O(shift * size) or something, and I'm quite sure the whole thing can be solved with and O(size).

Note: I showed right shift, left shifting is quite the same thing

CodePudding user response：

First of all, you do not need the shift loop. You can first do a modulus: int n = shift % CHAR_BIT; and a division int m = shift / CHAR_BIT;. Indeed, rotating a char variable k * CHAR_BIT times is equivalent to move directly bytes by k. Moreover, you you can directly exact the n least significant bits of *(s j) using int pass = *(s j) & ((1 << n) - 1);. Then, a shift of n bits can be directly with a simple left shift (*(s j) >>= n;). Finally, the "merge" of pass with the previously left shifted value can be done using *(s j) |= pass << (CHAR_BIT - n));. Note that the previous operations work on unsigned char variables that are safer (than a signed char) in this case since a right shift of a negative signed type is implementation defined. This makes the algorithm running in O(size) time.

An optimisation is to not do anything if shift is 0 and otherwise use memmove when n is 0.

Another optimization is to work on several char items at the same time using bigger types like uint64_t. However, this assume CHAR_BIT is 8 which is the case on almost all (sane) modern processor in the world. However, you should be very careful with this optimization since the loaded/stored values need to be correctly aligned and the type punning must be safe (using either memcpy or a unions) so to not break the strict aliasing rule (resulting in an undefined behaviour).

An alternative optimization is to use SIMD intrinsics (like SSE, AVX2 on modern x86-64 processors or NEON on most ARM processors). This optimization is more efficient than working on bigger types and much safer. However, using intrinsics requires advanced programming skills and make the code less portable and often barely readable. SSE/NEON can work simultaneously on 16 8-bit items per cycle and AVX2 up to 32 8-bit items resulting in a drastically faster code. Note that compilers can sometimes do that automatically for you (assuming optimization flags are enabled).

Note that writing s[j] instead of *(s j) is easier to read and shorter. Also note that the expression -take ^ *(s j) certainly results in an implementation defined behaviour since -1 can be represented differently on different architectures (see two's complement and one's complement) although almost all modern processors use the two's complement.

CodePudding user response：

I don't know how fast you have to go with those shifts but I would have an alternative way to propose following a little python style. The source I propose was written in one go, therefore imprecise but gives the idea and is composed as follows: two functions, one of which transforms the block to be shifted into a string and the other vice-versa. In between, the shift (both left and right) is done with a sprintf. The rest of the code are printf to display the results. Take a look:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *BitToChar(void *, size_t);
void *CharToBit(char *);

int main(void)
{
    uint8_t block[] = {0x81, 0x81, 0x81, 0x81, 0x81, 0x81, 0x81, 0x81};
    uint8_t *new_block;
    size_t  blk_len = sizeof(block) / sizeof(uint8_t);
    char *CnvBit, *NewBits;
    int shift = 8, cut;

    printf("\nbit to shift: ");
    scanf("%d", &shift);
    if(shift <= 0)                  
    {                               
        printf("\nno need shift\n");
        return 0;                   
    }  
                         
    printf("\noriginal:               ");
    for(int i = 0; i < blk_len; i  )
        printf("0xX ", block[i]);
    printf("\n");

    if((CnvBit = BitToChar(block, blk_len)) == NULL)
        return -1;
    if((NewBits = malloc(1   strlen(CnvBit))) == NULL)
        return -1;
sright:
    {
        cut = strlen(CnvBit) - shift;
        sprintf(NewBits, "%0*d%*.*s", shift, 0, cut, cut, CnvBit);
        if((new_block = CharToBit(NewBits)) == NULL)
            return -1;
    }
    printf("\nshift - bit right:     ", shift);
    for(int i = 0; i < blk_len; i  )
        printf("0xX ", new_block[i]);
    printf("\n");
    free(new_block);
sleft:
    {
        sprintf(NewBits, "%s%0*d", &NewBits[shift], shift, 0);
        if((new_block = CharToBit(NewBits)) == NULL)
            return -1;
    }
    printf("\nshift back - bit left: ", shift);
    for(int i = 0; i < blk_len; i  )
        printf("0xX ", new_block[i]);
    printf("\n");
    free(new_block);

    free(CnvBit);
    free(NewBits);
    return 0;
}

char *BitToChar(void *blk, size_t blk_len)
{
    typedef uint8_t *arr;
    char *result, *p;
    if((result = malloc(1   (8 * blk_len))) == NULL)
        return (char *)NULL;
    arr arr_blk = blk;
    p = result;
    for(int j = 0; j < blk_len; j  )
        for(int k = 7; k >= 0; k--)
            *p   = (arr_blk[j] & (uint8_t)(1 << k)) ? '1' : '0';
    *p = '\0';
    return result;
}

void *CharToBit(char *BitStr)
{
    size_t bs_len = strlen(BitStr);
    size_t blk_len = (bs_len / 8 )   ((bs_len % 8) ? 1 : 0);
    uint8_t *block, *p;
    if((block = calloc(blk_len, sizeof(uint8_t))) == NULL)
        return (void *)NULL;
    p = BitStr;
    for(int j = 0; j < blk_len && *p != '\0'; j  )
        for(int k = 7; k >= 0; k--)
            block[j] |= (*p   - '0') << k;
    return block;
}