Home > Enterprise >  What statistical tests can I run to test the randomness of binary strings using python?
What statistical tests can I run to test the randomness of binary strings using python?

Time:08-01

I'm having issues implementing the block frequency test in Python to understand the randomness of a binary string. I was wondering if anyone would be able to help me out in understanding why the code wont run.

Also, are there any statistical tests to test the randomness of a binary string in Python or possibly Matlab?

from importlib import import_module
import_module
from tokenize import Special
import math
def block_frequency(self, bin_data: str, block_size=4):
    """
     Note that this description is taken from the NIST documentation [1]
    [1] http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf
    The focus of this tests is the proportion of ones within M-bit blocks. The purpose of this tests is to determine
    whether the frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption
    of randomness. For block size M=1, this test degenerates to the monobit frequency test.
    :param bin_data: a binary string
    :return: the p-value from the test
    :param block_size: the size of the blocks that the binary sequence is partitioned into
    """
# Work out the number of blocks, discard the remainder
(num_blocks)= math.floor((1010110001001011011010111110010000000011010110111000001101) /4)
block_start, block_end = 0, 4
# Keep track of the proportion of ones per block 
proportion_sum = 0.0
for i in range(num_blocks):
    # Slice the binary string into a block 
    block_data = (101010001001011011010111110010000000011010110111000001101)[block_start:block_end]
    # Keep track of the number of ones 
    ones_count = 0
    for char in block_data:
        if char == '1':
           ones_count  = 1
    pi = ones_count / 4
    proportion_sum  = pow(pi - 0.5, 2.0) 
    # Update the slice locations 
    block_start  = 4
    block_end  = 4 
    # Calculate the p-value
    chi_squared = 4.0 * 4 * proportion_sum
    p_val = Special.gammaincc(num_blocks / 2, chi_squared / 2)
    print(p_val)

CodePudding user response:

There are three issues that I see with your code.

  1. Using a hardcoded value in two different places. This is bad practice and error prone. I know this probably isn't what the OP was reffering to, but it's worth fixing while we're at it.
  2. A string of binary bits (especially one comparing to "1" further down) should be encapsulated in quotation marks, not parentheses. That's one of the errors being thrown, 'cause the way it's written now you've got a large integer which your trying to "index". (This goes along with using len where necessary and some other minor changes).
  3. You're using the wrong module...You probably mean to use scipy.special.gammainc and not tokenize.Special.gammaincc, which doesn't exist anyhow.

Putting it all together, try something like:

from importlib import import_module
from scipy.special import gammainc
import_module
import math


def block_frequency(self, bin_data: str, block_size=4):
    """
     Note that this description is taken from the NIST documentation [1]
    [1] http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf
    The focus of this tests is the proportion of ones within M-bit blocks. The purpose of this tests is to determine
    whether the frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption
    of randomness. For block size M=1, this test degenerates to the monobit frequency test.
    :param bin_data: a binary string
    :return: the p-value from the test
    :param block_size: the size of the blocks that the binary sequence is partitioned into
    """


# Work out the number of blocks, discard the remainder
my_binary_string = '101010001001011011010111110010000000011010110111000001101'
num_blocks = math.floor(len(my_binary_string) / 4)
block_start, block_end = 0, 4
# Keep track of the proportion of ones per block 
proportion_sum = 0.0
for i in range(num_blocks):
    # Slice the binary string into a block 
    block_data = my_binary_string[block_start:block_end]
    # Keep track of the number of ones 
    ones_count = 0
    for char in block_data:
        if char == '1':
            ones_count  = 1
    pi = ones_count / 4
    proportion_sum  = pow(pi - 0.5, 2.0)
    # Update the slice locations 
    block_start  = 4
    block_end  = 4
    # Calculate the p-value
    chi_squared = 4.0 * 4 * proportion_sum
    p_val = gammainc(num_blocks / 2, chi_squared / 2)
    print(p_val)
  • Related