Pack data, extreme Bitpacking in Python-CodePudding

I need to pack information as closely as possible into a bitstream.

I have variables with a different number of distinct states:

Number_of_states=[3,5,129,15,6,2]# A bit longer in reality

The best option I have in the Moment would be to create a bitfield, using

2 3 8 4 3 1 bit ->21 bit

However it should be possible to pack these states into np.log2(3*5*129*15*6*2)=18.4 bits, saving two bits. (In reality I have 298 bits an need to save a few)

In my case this would save about >5% of the data stream, which would help a lot.

Is there a viable solution in python to pack the data in this way? I tried packalgorithms, but they create too much overhead with just a few bytes of data. The string is no problem, it is constant and will be transmitted beforehand.

This is the code I am using in the moment:

from bitstring import pack
import numpy as np

DATA_TO_BE_PACKED=np.zeros(6)

Number_of_states=[3,5,129,15,6,2]#mutch longer in reality

DATA_TO_BE_PACKED=np.random.randint(Number_of_states)

string=''

for item in Number_of_states:
    string ='uint:{}, '.format(int(np.ceil(np.log2(item))))

PACKED_DATA = pack(string,*DATA_TO_BE_PACKED)

print(len(PACKED_DATA ))

print(PACKED_DATA.unpack(string))

CodePudding user response：

You can interpret the state as an index into a multidimensional array with shape (3, 5, 129, 15, 6, 2). This index can be encoded as the integer index into the flattened 1-d array with length 3*5*129*15*6*2 = 348300. NumPy has the functions ravel_multi_index and unravel_index that can do this for you.

For example, let num_states be the number of states for each component of your state:

In [86]: num_states = [3, 5, 129, 15, 6, 2]

Suppose state holds an instance of the data; that is, it records the state of each component:

In [87]: state = [2, 3, 78, 9, 0, 1]

To encode this state, pass it through ravel_multi_index. idx is the encoded state:

In [88]: idx = np.ravel_multi_index(state, num_states)

In [89]: idx
Out[89]: 316009

By construction, 0 <= idx < 348300, so it requires only 19 bits.

To restore state from idx, use unravel_index:

In [90]: np.unravel_index(idx, num_states)
Out[90]: (2, 3, 78, 9, 0, 1)

CodePudding user response：

This looks like a usecase of a mixed radix numeral system.

A quick proof of concept:

num_states = [3, 5, 129, 15, 6, 2]
input_data = [2, 3, 78, 9, 0, 1]
print("Input data: %s" % input_data)

To encode, you start with a 0, and for each state first multiply by number of states, and then add the current state:

encoded = 0
for i in range(len(num_states)):
    encoded *= num_states[i]
    encoded  = input_data[i]

print("Encoded: %d" % encoded)

To decode, you go in reverse, and get remainder of division by number of states, and then divide by number of states:

decoded_data = []
for n in reversed(num_states):
    v = encoded % n
    encoded = encoded // n
    decoded_data.insert(0, v)

print("Decoded data: %s" % decoded_data)

Example output:

Input data: [2, 3, 78, 9, 0, 1]
Encoded: 316009
Decoded data: [2, 3, 78, 9, 0, 1]

Another example with more values:

Input data: [2, 3, 78, 9, 0, 1, 84, 17, 4, 5, 30, 1]
Encoded: 14092575747751
Decoded data: [2L, 3L, 78L, 9L, 0L, 1L, 84L, 17L, 4L, 5L, 30L, 1L]