I have the following code:
def func(value, start=None, end=None):
if start is not None and start != 0:
start = -start
elif start == 0:
start = None
if end is not None:
end = -end - 1
return int('{:032b}'.format(value)[end:start], 2)
data = np.random.randint(1, 429496729, 10000)
starts = [10, 50, 100, 200]
stops = [30, 90, 170, 250]
data_dict = [{} for _ in range(len(starts))]
for ii, (start, stop) in enumerate(zip(starts, stops)):
range_array = np.arange(start, stop, 2)
data_dict[ii]['one'] = [func(value, 0, 8) for value in data[range_array]]
data_dict[ii]['two'] = [func(value, 9, 17) for value in data[range_array]]
data_dict[ii]['three'] = [func(value, 27, 27) for value in data[range_array]]
data_dict[ii]['four'] = [func(value, 28, 28) for value in data[range_array]]
The problem is that this code runs through relatively slowly. However, all other approaches I have tried so far are even slower. Does anyone have an idea how to rewrite this code so that it runs through faster?
CodePudding user response:
You can use numpy
broadcasting to vectorize the bitmasking with logical and &
and shifting >>
.
import numpy as np
np.random.seed(100)
data = np.random.randint(1, 429496729, 10000)
starts = [10, 50, 100, 200]
stops = [30, 90, 170, 250]
# equal to 'start' from calling func(value, start, end)
shift = np.array([0,9,27,28])[:, None]
# equal to 'end - start 1' from calling func(value, start, end)
bitmask = np.array([9,9,1,1])[:, None]
d = [data[start:stop:2] >> shift & (2**bitmask - 1) for start, stop in zip(starts, stops)]
To access the result list d
d[0]
Output
array([[ 54, 227, 291, 281, 229, 59, 508, 87, 365, 416],
[ 40, 207, 353, 168, 214, 271, 338, 268, 419, 52],
[ 1, 0, 0, 0, 0, 0, 1, 1, 0, 0],
[ 0, 1, 1, 1, 0, 0, 0, 1, 1, 0]])
And access similar to your dictionarys
one, two, three, four = np.arange(4)
d[1][two]
Output
array([ 68, 479, 230, 295, 278, 455, 276, 45, 360, 488, 241, 336, 447,
316, 181, 94, 138, 404, 223, 310])
In the event that the downvote is that this answer does not produce the "exact" result then adding:
actual = [
{
name: x[index].tolist()
for index, name
in enumerate(["one","two","three","four"])
}
for x in d
]
Produces the exact result and maintains an order of magnitude boot in performance.
CodePudding user response:
You must fix your IDE, I mean your Virtual environment. Using Jupyter is relatively slow, but Pycharm and VS code are very fast. Even with complex code, I have never faced the problem of code being slow except on Jupiter where it doesn't load. Maybe increasing empty space of your PC by deleting unnecessary files would also work