I'm trying to load a binary file to numpy and drop some unwanted values that i don't need, then reshape that array and use it to do some calculations.
here is my code for reference:
def read_binary_data(filename, buffer_size):
np.set_printoptions(threshold=sys.maxsize)
np.set_printoptions(formatter={'int': hex})
with open(filename, "rb") as f:
binary_array = np.fromfile(f, dtype='B', offset=3)
print(binary_array)
and here is the result:
...
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xcf 0xf4
0xff 0xff 0x0 0x0 0x0 0x0]
Let's say for instance I want to remove all occurrences of 0x4e 0x44
but not 0x4e
and 0x44
on their own, it's the combination of the two that I'm interested in. because if say i have 0x4e 0x54
I want to keep this one intact.
how would I be able to do that?
Thank you for your help
CodePudding user response:
Thank you every one for your input.
I figured out a way to achieve this as efficiently as I could. there probably may be a better way to do this but for now this work :D
np.set_printoptions(formatter={'int': hex})
with open(filename, "rb") as f:
binary_array = np.fromfile(f, dtype='B')
# Clean array
to_remove = []
indexes = np.where(binary_array == 0x4e)
# find all occurences of 4E54
x54_mask = binary_array[indexes[0] 1] == 0x54
to_remove = [*to_remove, *list(indexes[0][x54_mask])]
# find all occurences of 4E53
x53_mask = binary_array[indexes[0] 1] == 0x53
to_remove = [*to_remove, *list(indexes[0][x53_mask])]
# removing unwanted values
to_remove_f = []
for i in to_remove:
to_remove_f.append(i)
to_remove_f.append(i 1)
binary_array = np.delete(binary_array, to_remove_f)
A for loop is only used over the 'to_remove' list which only contains < 10 values.
Peace :D
CodePudding user response:
Note that just because your array is printing hexadecimal values, the values themselves are still integers. Anyway, here's one way to find and delete pairs of 0x4e 0x44
, though probably not the most efficient:
indices_to_delete = []
for i in range(len(binary_array) - 1):
# Check if the current value and the one right after are a pair
# If so, they will need to be deleted
if binary_array[i] == int("0x4e", 0) and binary_array[i 1] == int("0x44", 0):
indices_to_delete = [i, i 1]
binary_array = np.delete(binary_array[None, :], indices_to_delete, axis=1)[0]
Your binary array now has no pairs of 0x4e 0x44
, though any singular instances of 0x4e
or 0x44
have been left alone.