Let's say I have a list of tuples in a form of:
data = [(1,2), (1,2), ..., (1,2)]
and a method data_to_bytes
that accepts tuple does something to it and returns bytes. Now I want to call this method on every element from data
and save output to the file. This is something I already have:
def create_data_file(data_file, data):
with data_file.open('wb') as _file:
for i in data:
_file.write(data_to_bytes(i))
return data_file
But this is terribly slow and I would like to improve it. Maybe it is possible to get rid of the loop inside with. I was thinking about using numpy somehow and maybe call data_to_bytes()
on whole array instead of every element. Is this possible somehow?
CodePudding user response:
Numpy is only fast if you can replace your "data_to_bytes" by numpy functions that run on the whole array. Numpy does not vectorize arbitrary functions.
If you don't think that's possible, other ways of increasing performance might be:
- Caching results of "data_to_bytes" or
- Improving the performance of "data_to_bytes" itself
CodePudding user response:
Take a look at numpy.apply_along_axis. This takes the input array and applies the given function along the axis returning a new array with the computed data.
CodePudding user response:
Try this code:
data = [(1, 2), (1, 2), (1, 2)]
def create_data_file(data_file, data):
with open(data_file, 'wb') as _file:
for i in data:
_file.write(bytearray(i))
print(type(bytearray(i)))
return data_file
create_data_file('test.txt', data)