The need arose because I want to calculate something like:
df['result'] = np.where(df['data1'] == True, df["data2"].rolling(window).max(), 0)
However: window
is practically df['window']
; a Series of data in the same time series; it contains integers and they may look something like [2,2,2,3,3,2,2,4,2,2]
.
What's the most vectorized or other efficient way to do this?
CodePudding user response:
Setup
import pandas as pd
import numpy as np
np.random.seed([3,14])
a = np.random.randn(20).cumsum()
w = np.minimum(
np.random.randint(1, 4, size=a.shape),
np.arange(len(a)) 1
)
df = pd.DataFrame({'Data': a, 'Window': w})
df
Data Window
0 -0.602923 1
1 -1.005579 2
2 -0.703250 3
3 -1.227599 1
4 -0.683756 1
5 -0.670621 2
6 -0.997120 1
7 0.387956 3
8 0.255502 1
9 -0.152361 2
10 1.150534 3
11 0.546298 3
12 0.302936 3
13 0.091674 1
14 -1.964947 1
15 -1.447079 2
16 -1.487828 1
17 -2.539703 1
18 -1.932612 3
19 -4.163049 2
Vectorized variable window rolling maximum
idx_base = np.arange(len(df))
windows = df.Window.to_numpy()
data = df.Data.to_numpy()
idx_max = np.concatenate([
np.arange(position-(window_size-1), position 1)
for window_size, position in zip(windows, idx_base)
])
idx_pos = np.repeat(idx_base, windows)
data_windowed = data[idx_max]
out = np.full(data.shape, data.min())
np.maximum.at(out, idx_pos, data_windowed)
The answer is now in the out
variable. Let's look
df.Data.iloc[idx_max].groupby(idx_pos).max().to_frame('Pandas').assign(Numpy=out)
Pandas Numpy
0 -0.602923 -0.602923
1 -0.602923 -0.602923
2 -0.602923 -0.602923
3 -1.227599 -1.227599
4 -0.683756 -0.683756
5 -0.670621 -0.670621
6 -0.997120 -0.997120
7 0.387956 0.387956
8 0.255502 0.255502
9 0.255502 0.255502
10 1.150534 1.150534
11 1.150534 1.150534
12 1.150534 1.150534
13 0.091674 0.091674
14 -1.964947 -1.964947
15 -1.447079 -1.447079
16 -1.487828 -1.487828
17 -2.539703 -2.539703
18 -1.487828 -1.487828
19 -1.932612 -1.932612
Appendix
Look at idx_max
. You'll see how we end up vectorizing this.
print(idx_max)
[ # Position Window
0 # 0 1
0 1 # 1 2
0 1 2 # 2 3
3 # 3 1
4 # 4 1
4 5 # 5 2
6 # 6 1
5 6 7 # 7 3
8 # 8 1
8 9 # 9 2
8 9 10 # 10 3
9 10 11 # 11 3
10 11 12 # 12 3
13 # 13 1
14 # 14 1
14 15 # 15 2
16 # 16 1
17 # 17 1
16 17 18 # 18 3
18 19 # 19 2
]