Python/Numpy - Extracting Bits of Bytes-CodePudding

I have 8 bytes of data in the form of a numpy.frombuffer() array. I need to get bits 10-19 into a variable and bits 20-29 into a variable. How do I use Python to extract bits that cross bytes? I've read about bit shifting but it isn't clear to me if that is the way to do it.

CodePudding user response：

Depending on your datatype you might need to slightly modify this numpy solution:

a = np.frombuffer(b'\x01\x02\x03\x04\x05\x06\x07\x08', dtype=np.uint8)
#array([1, 2, 3, 4, 5, 6, 7, 8], dtype=uint8)

unpacking bits:

first = np.unpackbits(a)[10:20]
#array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0], dtype=uint8)

And if you need to repack the bits:

first_packed = np.packbits(first)
array([8, 0], dtype=uint8)

Please note that python is 0-based index and if you want 10th-19th element, please adjust the above indexing to np.unpackbits(a)[9:19].

Similarly for other case:

second = np.unpackbits(a)[20:30]
#array([0, 0, 1, 1, 0, 0, 0, 0, 0, 1], dtype=uint8)

CodePudding user response：

Get each bit individually by indexing the correct byte, then masking off the correct bit. Then you can shift and add to build your new number from the bits.

data = b'abcdefgh' #8 bytes of data

def bit_slice(data, start, stop):
    out = 0
    for i in range(start, stop):
        byte_n = i//8
        byte_bit = i%8
        byte_mask = 1<<byte_bit
        bit = bool(data[byte_n] & byte_mask)
        out = out*2   bit #multiply by 2 is equivalent to shift. Then add the new bit
    return out

re:comments

Each time we want to add a new bit to our number like so:

10110
101101

We have to shift the first five bits over and then either add 1 or 0 based on what the value of the next bit is. Shifting to the left moves each digit one place higher, which in binary means multiply by 2. In decimal shifting a number over one place means multiply by 10. When adding the new bit to our number we're accumulating I simply multiply by 2 instead of using the right shift operator just to show it's another option. When creating the byte mask, I did use the right shift operator (<<). It works by shifting a 1 several places over so I end up with a byte that has a 1 in just the right place that when I "and" it with the byte in question, I get just the single bit I want to index:

1<<3 = 00001000
1<<5 = 00100000
1<<0 = 00000001
1<<7 = 10000000

then apply the mask to get the bit we want:

10011011 #a byte of data
00100000 #bit mask for the 32's place
_________&
00000000
#bit in the 32's place is 0

10011011 #a byte of data
00010000 #bit mask for the 16's place
_________&
00010000
#bit in the 16's place is 1

After applying the mask, if the selected bit is 0 than the entire number will always be 0. If the selected bit is 1 the number will always be greater than 0. Calling bool on that result is equivalent to:

if data[byte_n] & byte_mask > 0:
    bit = 1
else:
    bit = 0

... because a boolean interpreted as an integer is simply a 1 or a 0.