Binary file processing taking a long time-CodePudding

I have a binary file of hex bytes that is only 15 MB and is taking a really long to process with the code below. I import the data to a list and run through them 1 frame (50 bytes) of data at a time. I wonder if there is some improvement I can make to cut down the processing time?

with open(r'C:\binary_file.bin', 'rb') as p:
    my_bytes = p.read()

data= list(my_bytes)

frame = 50
sample = 0
processed = []

scales = [
    0.0625, 
    0.0625,
    0.00390625,
    0.00390625,
    3.05176e-05,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    0.0078125,
    0.001953125,
    0.0001220703,
    0.0001220703,
    0.0001220703,
    3.05176e-05,
    1.0,
    0.0001220703,
    0.001953125,
    1.0,
    1.0,
    1.0
]

while len(data) >= frame:
    temp1 = [data[i]   data[i   1] * 256 for i in range(0, frame - 10, 2)]
    temp2 = [i - 65536 if i > 32767 else i for i in temp1]
    temp3 = [
        int(a * b) if (b == 1 or b == 2) else a * b
        for a, b in zip(temp2, scales)
    ]
    temp3 = [round(i, 5) for i in temp3]
    temp3.insert(0, sample)
    sample  = 1
    processed  = [temp3]
    del data[:frame]

CodePudding user response：

You read the whole file into a list. You then delete {frame} bytes at a time from the front of the list. Just doing this is going to take forever. Try deleting all the other code that does stuff "with" the data, and leave only the code that reads the file and fetches the next frame, including that problematic line

del data[:frame]

You'll notice it takes just as long, I suspect, because deleting from the front of a list is a very expensive process, and you do it over and over, on a very large list, for no reason.

Your program would have taken several seconds if you just advanced the frame, and didn't modify the data structure holding it. It took about 8 seconds for me on a file of several MB using https://stackoverflow.com/a/312464/1766544

But we can do better. Reading the whole file into memory is unnecessary because you are only using a frame at a time. You have a file handle for that -- let the OS do its job. Now your data just holds the size of data that you want it to work with anyway.

done = False
with open(filename, 'rb') as p:
    while not done:
        my_bytes = p.read(frame)
        done = process_frame(my_bytes)

This was almost instantaneous for me, running your full code on an arbitrary 14MB file on my system.