Reading binary file. Translate matlab to python-CodePudding

I'm going to translate the working matlab code for reading the binary file to python code. Is there an equivalent for

% open the file for reading
fid=fopen (filename,'rb','ieee-le');
% first read the signature
tmp=fread(fid,2,'char');
% read sizes
rows=fread(fid,1,'ushort');
cols=fread(fid,1,'ushort');

CodePudding user response：

there's the struct module to do that, specifically the unpack function which accepts a buffer, but you'll have to read the required size from the input using struct.calcsize

import struct
endian = "<"  # little endian
with open(filename,'rb') as f:
    tmp = struct.unpack(f"{endian}cc",f.read(struct.calcsize("cc")))
    tmp_int = [int.from_bytes(x,byteorder="little") for x in tmp]
    rows = struct.unpack(f"{endian}H",f.read(struct.calcsize("H")))[0]
    cols = struct.unpack(f"{endian}H",f.read(struct.calcsize("H")))[0]

you might want to use the struct.Struct class for reading the rest of the data in chunks, as it is going to be faster than decoding numbers one at a time. ie:

data = []
reader = struct.Struct(endian   "i"*cols)  # i for integer
row_size = reader.size
for row_number in range(rows):
    row = reader.unpack(f.read(row_size))
    data.append(row)

Edit: corrected the answer, and added an example for larger chuncks.

Edit2: okay, more improvement, assuming we are reading 1 GB file of shorts, storing it as python int makes no sense and will most likely give an out of memory error (or system will freeze), the proper way to do it is using numpy

import numpy as np
data = np.fromfile(f,dtype=endian 'H').reshape(cols,rows)  # ushorts

this way it'll have the same space in memory as it did on disk.