I need to pack information as closely as possible into a bitstream.
I have variables with a different number of distinct states:
Number_of_states=[3,5,129,15,6,2]# A bit longer in reality
The best option I have in the Moment would be to create a bitfield, using
2 3 8 4 3 1 bit ->21 bit
However it should be possible to pack these states into np.log2(3*5*129*15*6*2)=18.4
bits, saving two bits. (In reality I have 298 bits an need to save a few)
In my case this would save about >5% of the data stream, which would help a lot.
Is there a viable solution in python to pack the data in this way? I tried packalgorithms
, but they create too much overhead with just a few bytes of data. The string is no problem, it is constant and will be transmitted beforehand.
This is the code I am using in the moment:
from bitstring import pack
import numpy as np
DATA_TO_BE_PACKED=np.zeros(6)
Number_of_states=[3,5,129,15,6,2]#mutch longer in reality
DATA_TO_BE_PACKED=np.random.randint(Number_of_states)
string=''
for item in Number_of_states:
string ='uint:{}, '.format(int(np.ceil(np.log2(item))))
PACKED_DATA = pack(string,*DATA_TO_BE_PACKED)
print(len(PACKED_DATA ))
print(PACKED_DATA.unpack(string))
CodePudding user response:
You can interpret the state as an index into a multidimensional array with shape (3, 5, 129, 15, 6, 2). This index can be encoded as the integer index into the flattened 1-d array with length 3*5*129*15*6*2 = 348300
. NumPy has the functions ravel_multi_index
and unravel_index
that can do this for you.
For example, let num_states
be the number of states for each component of your state:
In [86]: num_states = [3, 5, 129, 15, 6, 2]
Suppose state
holds an instance of the data; that is, it records the state of each component:
In [87]: state = [2, 3, 78, 9, 0, 1]
To encode this state, pass it through ravel_multi_index
. idx
is the encoded state:
In [88]: idx = np.ravel_multi_index(state, num_states)
In [89]: idx
Out[89]: 316009
By construction, 0 <= idx < 348300
, so it requires only 19 bits.
To restore state
from idx
, use unravel_index
:
In [90]: np.unravel_index(idx, num_states)
Out[90]: (2, 3, 78, 9, 0, 1)
CodePudding user response:
This looks like a usecase of a mixed radix numeral system.
A quick proof of concept:
num_states = [3, 5, 129, 15, 6, 2]
input_data = [2, 3, 78, 9, 0, 1]
print("Input data: %s" % input_data)
To encode, you start with a 0, and for each state first multiply by number of states, and then add the current state:
encoded = 0
for i in range(len(num_states)):
encoded *= num_states[i]
encoded = input_data[i]
print("Encoded: %d" % encoded)
To decode, you go in reverse, and get remainder of division by number of states, and then divide by number of states:
decoded_data = []
for n in reversed(num_states):
v = encoded % n
encoded = encoded // n
decoded_data.insert(0, v)
print("Decoded data: %s" % decoded_data)
Example output:
Input data: [2, 3, 78, 9, 0, 1]
Encoded: 316009
Decoded data: [2, 3, 78, 9, 0, 1]
Another example with more values:
Input data: [2, 3, 78, 9, 0, 1, 84, 17, 4, 5, 30, 1]
Encoded: 14092575747751
Decoded data: [2L, 3L, 78L, 9L, 0L, 1L, 84L, 17L, 4L, 5L, 30L, 1L]