##Mock Data##
my_list = list(range(1700))
import itertools
cross_product = list(itertools.product(my_list,my_list))
station_combinations = ["_".join([str(i),str(b)]) for i,b in cross_product if i != b]
###############
from time import time,sleep
station_name = "5"
start = time()
for h in range(10):
reverse_indexes = [count for count,j in enumerate(station_combinations) if j.split("_")[1] == station_name ]
regular_indexes = [count for count,j in enumerate(station_combinations) if j.split("_")[0] == station_name ]
print(time() - start )
Hello, I have shared the reproducible code above.
Background: Let me quickly introduce my data station_combinations is the cross product of "my_list" separated by the notation "_". You can think of it as a destination among "my_list" items so 1_2 would be going from 1 to 2 whereas 2_1 would going from 2 to 1.
So I will refer as "a_b" Among all the combinations in "reverse_indexes", I am trying to find the index of elements where b in ( "a_b" ) is equal to "station_name", so the "destination" is equal to station name, and in the regular_indexes an in ("a_b") the source is equal to the station_name
Problem: The code that I have works however it is very slow. if you look at the for loop (with cursor h) I iterate 10 times, however, in the original code, it is supposed to be approx. 2000. With even 10 iterations it approx. takes 8seconds on my computer. I am looking for ways to improve the speed significantly. I have tried the library numba, however because I actually get some of the data from a data frame I wasn't able to work it out with the "@njit" functionality. Would anyone be able to help?
CodePudding user response:
One solution can be using indexes, in this case two indexes for a
and b
. For example:
my_list = list(range(1700))
import itertools
cross_product = list(itertools.product(my_list, my_list))
station_combinations = [
"_".join([str(i), str(b)]) for i, b in cross_product if i != b
]
# precompute indexes:
index_a = {}
index_b = {}
for i, s in enumerate(station_combinations):
a, b = s.split("_")
index_a.setdefault(a, []).append(i)
index_b.setdefault(b, []).append(i)
from time import time, sleep
station_name = "5"
start = time()
for h in range(10):
reverse_indexes_new = index_b.get(station_name, [])
regular_indexes_new = index_a.get(station_name, [])
print(time() - start)
Prints on my machine:
7.62939453125e-06