Home > database >  Cleaning a binary numpy array by removing some elements that fit a condition
Cleaning a binary numpy array by removing some elements that fit a condition

Time:04-23

I'm trying to load a binary file to numpy and drop some unwanted values that i don't need, then reshape that array and use it to do some calculations.

here is my code for reference:

def read_binary_data(filename, buffer_size):

    np.set_printoptions(threshold=sys.maxsize)
    np.set_printoptions(formatter={'int': hex})

    with open(filename, "rb") as f:
        binary_array = np.fromfile(f, dtype='B', offset=3)

    print(binary_array)

and here is the result:

 ...
 0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
 0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
 0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
 0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
 0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
 0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
 0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xcf 0xf4
 0xff 0xff 0x0 0x0 0x0 0x0]

Let's say for instance I want to remove all occurrences of 0x4e 0x44 but not 0x4e and 0x44 on their own, it's the combination of the two that I'm interested in. because if say i have 0x4e 0x54 I want to keep this one intact.

how would I be able to do that?

Thank you for your help

CodePudding user response:

Thank you every one for your input.

I figured out a way to achieve this as efficiently as I could. there probably may be a better way to do this but for now this work :D

    np.set_printoptions(formatter={'int': hex})
    with open(filename, "rb") as f:
        binary_array = np.fromfile(f, dtype='B')

    # Clean array
    to_remove = []
    indexes = np.where(binary_array == 0x4e)

    # find all occurences of 4E54
    x54_mask = binary_array[indexes[0]   1] == 0x54
    to_remove = [*to_remove, *list(indexes[0][x54_mask])]

    # find all occurences of 4E53
    x53_mask = binary_array[indexes[0]   1] == 0x53
    to_remove = [*to_remove, *list(indexes[0][x53_mask])]

    # removing unwanted values
    to_remove_f = []
    for i in to_remove:
        to_remove_f.append(i)
        to_remove_f.append(i   1)

    binary_array = np.delete(binary_array, to_remove_f)

A for loop is only used over the 'to_remove' list which only contains < 10 values.

Peace :D

CodePudding user response:

Note that just because your array is printing hexadecimal values, the values themselves are still integers. Anyway, here's one way to find and delete pairs of 0x4e 0x44, though probably not the most efficient:

indices_to_delete = []

for i in range(len(binary_array) - 1):

    # Check if the current value and the one right after are a pair
    # If so, they will need to be deleted
    if binary_array[i] == int("0x4e", 0) and binary_array[i 1] == int("0x44", 0):
        indices_to_delete  = [i, i 1]

binary_array = np.delete(binary_array[None, :], indices_to_delete, axis=1)[0]

Your binary array now has no pairs of 0x4e 0x44, though any singular instances of 0x4e or 0x44 have been left alone.

  • Related