Given a dataframe df: pd.Dataframe
and a subset selected_indexes
of indexes from df.index
how can I resample df with the max
operator applied to each interval selected_indexes[i], selected_indexes[i 1]
?
For example, given a dataframe:
col
0 5
1 0
2 3
3 3
4 7
5 9
6 3
7 5
8 2
9 4
And a selection of index "selected_indexes = [0, 5, 6, 9]" and applying the maximum on the col
column between each interval (assuming we keep the end point and exclude the starting point), we should get:
col
0 5
5 9
6 3
9 5
For example the line 9
was made with max(5, 2, 4)
from lines 7, 8, 9 \in (6, 9]
.
CodePudding user response:
new interpretation
selected_indexes = [0, 5, 6, 9]
group = (df.index.to_series().shift() # make groups
.isin(selected_indexes) # based on
.cumsum() # previous indices
)
# get max per group
out = df.groupby(group).max().set_axis(selected_indexes)
# or for many aggregations (see comments):
out = (df.groupby(group).agg({'col1': 'max', 'col2': 'min'})
.set_axis(selected_indexes)
)
Output:
col
0 5
5 9
6 3
9 5
previous interpretation of the question
You likely need a rolling.max
, not resample:
out = df.loc[selected_indexes].rolling(3, center=True).max()
Or, if you want the ±1 to apply to the data before selection:
out = df.rolling(3, center=True).max().loc[selected_indexes]
Example:
np.random.seed(0)
df = pd.DataFrame({'col': np.random.randint(0, 10, 10)})
selected_indexes = [1, 2, 3, 5, 6, 8, 9]
print(df)
col
0 5
1 0
2 3
3 3
4 7
5 9
6 3
7 5
8 2
9 4
out = df.loc[selected_indexes].rolling(3, center=True).max()
print(out)
col
1 NaN
2 3.0
3 9.0
5 9.0
6 9.0
8 4.0
9 NaN
out2 = df.rolling(3, center=True).max().loc[selected_indexes]
print(out2)
col
1 5.0
2 3.0
3 7.0
5 9.0
6 9.0
8 5.0
9 NaN