Extracting data embedded in an 8-bit array into different size array-CodePudding

I am working on a project which receives a series of parameters condensed into an 8 bit array. These parameters can come in varying sizes, but are always sent in an 8 bit array. The goal is to extract these parameters into an array the size of the largest sized parameter (which is already known beforehand).

For example, there could be two 4 bit parameters stored in the first index of the parameter array, then one 8 bit parameter, then one 16 bit parameter. The goal would then be to place each parameter separately in a 16 bit array of length 4, where extra space are just 0's. I am struggling finding a way to do this efficiently, especially since the i2c array could potentially have multiple parameters embedded within one byte of the array. Any suggestions would be greatly appreciated!

CodePudding user response：

I solved this kind of problem before by simply considering the received array of bytes as a bit flow.

So it's quite easy to read 3 bits, then 7 bits, then 10 bits, then 2 bits, etc. until the end of buffer.

You simply need to keep an index to the current bit:

struct bitflow {
  unsigned char* data ; // Received data
  size_t datasize ; // Data size, in bytes.
  size_t pc ;  // Current bit pointer, initialize it to zero.
}

To get the next bit (0/1=bit, -1=error):

int getnextbit (struct bitflow* b) {
  // Check if we didn't already reached the end of buffer.
  // "pc" must be stricly below (classic C indexes) than the number of bytes (datasize) multiplied by 8 (number of bits per byte).
  // A shift of 3 ranks to the left is the same as multiplying by 8, but is faster and will be faster even without any enabled optimization from compiler.
  if (pc>=(datasize<<3))
    return -1 ;
  // Where is the bit:
  // - Targeted byte is the integer part of "pc/8". Again, a shift to accelerate that.
  // - Targeted bit is the remainder of "pc/8". A binary mask with 7 do the same, again faster than a modulo even without optimizations.
  // We then shit a bit (1) to the position of the remainder, and with a bitwise "AND", we extract the bit.
  // If the result is zero, the bit wasn't set, we return zero.
  // If non-zero (equal to (1<<((b->pc) & 7)), in fact), we return "1".
  int result = (((b->data[(b->pc)>>3]) & (1<<((b->pc) & 7))) ? 1 : 0) ;
  // We extracted a bit, so we increment the pointer to next bit.
  b->pc   ;
  return result ;
}

To get the next N bits (max. 31 bits, >=0 = OK, <0=error):

int getnextbits (struct bitflow* b, int nbbits) {
  if ((nbbits<0) || (nbbits>31))
    return -2 ;
  if ((pc nbbits)>=(datasize<<3))
    return -1 ;
  int result = 0 ;
  size_t j=b->pc ;
  for (int i=0 ; i<nbbits ; i  , j  ) {
    result <<= 1 ;
    result |= (((b->data[j>>3]) & (1<<(j & 7))) ? 1 : 0) ;
  }
  b->pc  = nbbits ;
  return result ;
}

This last function can be heavily optimized by getting directly all possible remaining bits of the current byte, if nbbits (or the number of bits still to extract) is greater or equal to the number of bits left in the current byte. But it shouldn't be needed if you don't have a huge load of data, which is quite unlikely if you get these data from I²C.

Endianness can complexify things a bit, but it's not impossible to solve - I usually do the endianness swap directly at reception when needed.

Finally, you can read data this way:

// Considering structure is filled by reception routine.
struct bitflow b = getdatafromi2c() ;
// Number of bits for each consecutive elements.
int format[10] = { 4, 4, 2, 2, 4, 16, 10, 2, 1, 1 } ;
unsigned short int result[10] ;
for (int i=0 ; i<10 ; i   ) {
  int v = getnextbits(&b, format[i]) ;
  if (v<0) {
    // Error handling
    ....
  }
  result[i] = (unsigned short int)v ;
}
// Use result array normally from now.

CodePudding user response：

    uint8_t input_array[4];
    uint16_t output_array[4];

    // I have to make an assumption that the two fours are filled little-endian. If it's wrong, it's wrong.
    // You will generally have this problem whenever parameters span bytes. You have to know.
    output_array[0] = input_array[0] & 0xF;
    output_array[1] = input_array[0] >> 4;
    output_array[2] = input_array[1];
    // Similar story here. I assume that input_array's bytes are little endian.
    // You normally find bit-endian and byte-endian are the same but a contrary system could exist.
    output_array[3] = input_array[2] | (input_array[3] << 8);

In the old days people used to do this with structs. Somebody's going to come by and tell me this is undefined, etc; it doesn't really matter. A compiler that emits terrible code for the above emits good code for the below. It's the modern optimizing compilers that an figure the most efficient way for the above that will mess up the below. This has all kinds of alignment and endian assumptions built into it; however where it works, it works well. Where it doesn't work it's undefined.

     union u {
         uint8_t input_array[4];
         struct input {
             unsigned char param0 : 4;
             unsigned char param1 : 4;
             unsigned char param2 : 8;
             unsigned char param3 : 6;
         }
     } input_form;

    union input_form input;
    uint16_t output_array[4];
    output_array[0] = input.input.param0;
    output_array[1] = input.input.param1;
    output_array[2] = input.input.param2;
    output_array[3] = input.input.param3;