The objective is to perform a particular procedure if the counter
(i.e., idx) is within any of multiple range
In this case, the range is originated from a df
, as below
df=pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))
For example, certain activity is fired if the counter
integer value is within (4-8)
OR (20,25)
.
The following code should answer the following objective
import pandas as pd
df=pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))
r_bot=df['rbot'].values.tolist()
r_top=df['rtop'].values.tolist()
for idx in range (120):
h=[True for x,y in zip(r_bot,r_top) if x <= idx <=y ]
if True in h:
print(f'Do some operation with {idx}')
Which produce the following output
Do some operation with 4
Do some operation with 5
Do some operation with 6
Do some operation with 7
Do some operation with 8
Do some operation with 20
Do some operation with 21
Do some operation with 22
Do some operation with 23
Do some operation with 24
Do some operation with 25
In actual implementation, the range pairs can be up to hundreds,whereas, the counter can be up to hundreds thousand. Hence, I am wondering whether this is more efficient of doing this?
CodePudding user response:
one option is with pandas cut and an interval index:
arr = np.arange(120)
intervals = pd.IntervalIndex.from_arrays(df.rbot, df.rtop, closed='both')
out = pd.cut(arr, intervals)
out = arr[pd.notna(out)]
for idx in out:
print(f'Do some operation with {idx}')
Do some operation with 4
Do some operation with 5
Do some operation with 6
Do some operation with 7
Do some operation with 8
Do some operation with 20
Do some operation with 21
Do some operation with 22
Do some operation with 23
Do some operation with 24
Do some operation with 25
You can skip the out and just iterate through, again it depends on your end goal :
for idx in arr:
if intervals.contains(idx).any():
print(f'Do some operation with {idx}')
Thanks to @user2246849 for the tests, which I think you should have a look and see if it meets your needs.
CodePudding user response:
You could try numpy broadcasting to create a boolean mask that returns True for the indices that fall between each pair of rbot
and rtop
values. Then multiply it with the range
to get the relevant values. Finally, use flatnonzero
to select the True values:
import numpy as np
arr = np.arange(120)
msk = ((df[['rbot']].to_numpy() <= arr) & (arr <= df[['rtop']].to_numpy())).sum(axis=0)
out = np.flatnonzero(msk*arr)
for idx in out:
print(f'Do some operation with {idx}')
Output:
Do some operation with 4
Do some operation with 5
Do some operation with 6
Do some operation with 7
Do some operation with 8
Do some operation with 20
Do some operation with 21
Do some operation with 22
Do some operation with 23
Do some operation with 24
Do some operation with 25
CodePudding user response:
There's a lot of ways to approach this, here's one~
df = pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))
df.rtop = 1
for idx in range(120):
if any(idx in range(*df.iloc[x]) for x in df.index):
print(f'Do some operation with {idx}')
Output:
Do some operation with 4
Do some operation with 5
Do some operation with 6
Do some operation with 7
Do some operation with 8
Do some operation with 20
Do some operation with 21
Do some operation with 22
Do some operation with 23
Do some operation with 24
Do some operation with 25
CodePudding user response:
FYI, if you exclusively want to perform an operation for each valid index and not intend to perform any additional aggregations later that would require pandas, this is way faster and memory efficient:
import pandas as pd
rbot = [i*1000 for i in range(10000)]
rtop = [(i 1)*1000-2 for i in range(10000)]
main_range = (0, 120)
df=pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))
intervals = zip(df['rbot'], df['rtop'])
for i in intervals:
overlap = range(max(main_range[0], i[0]), min(main_range[1], i[-1]) 1)
for idx in overlap:
print(f'Do some operation with {idx}')
Just compute the overlap of the main range with the subranges.
Do some operation with 4
Do some operation with 5
Do some operation with 6
Do some operation with 7
Do some operation with 8
Do some operation with 20
Do some operation with 21
Do some operation with 22
Do some operation with 23
Do some operation with 24
Do some operation with 25
Runtimes with larger dataset:
import pandas as pd
import numpy as np
rbot = [i*1000 for i in range(10000)]
rtop = [(i 1)*1000-2 for i in range(10000)]
main_range = (0, 120000)
df = pd.DataFrame({'rbot': rbot, 'rtop': rtop})
def python():
intervals = zip(df['rbot'], df['rtop'])
for i in intervals:
overlap = range(max(main_range[0], i[0]), min(main_range[1], i[-1]) 1)
for idx in overlap:
pass#print(idx)
# 5.03 ms ± 58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit python()
def pandas():
arr = np.arange(*main_range)
intervals = pd.IntervalIndex.from_arrays(df.rbot, df.rtop, closed='both')
out = pd.cut(arr, intervals)
out = arr[pd.notna(out)]
for idx in out:
pass#print(idx)
# 67 ms ± 467 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit pandas()
def numpy():
arr = np.arange(*main_range)
msk = ((df[['rbot']].to_numpy() <= arr) & (arr <= df[['rtop']].to_numpy())).sum(axis=0)
out = np.flatnonzero(msk*arr)
for idx in out:
pass#print(idx)
# 2.77 s ± 7.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit numpy()