Pandas: Resample a dataframe given a list of indexes that are not evenly spaces-CodePudding

Given a dataframe df: pd.Dataframe and a subset selected_indexes of indexes from df.index how can I resample df with the max operator applied to each interval selected_indexes[i], selected_indexes[i 1] ?

For example, given a dataframe:

And a selection of index "selected_indexes = [0, 5, 6, 9]" and applying the maximum on the col column between each interval (assuming we keep the end point and exclude the starting point), we should get:

For example the line 9 was made with max(5, 2, 4) from lines 7, 8, 9 \in (6, 9].

CodePudding user response：

new interpretation

selected_indexes = [0, 5, 6, 9]
group = (df.index.to_series().shift() # make groups
           .isin(selected_indexes)    # based on
           .cumsum()                  # previous indices
        )

# get max per group
out = df.groupby(group).max().set_axis(selected_indexes)

# or for many aggregations (see comments):
out = (df.groupby(group).agg({'col1': 'max', 'col2': 'min'})
         .set_axis(selected_indexes)
       )

Output:

previous interpretation of the question

You likely need a rolling.max, not resample:

out = df.loc[selected_indexes].rolling(3, center=True).max()

Or, if you want the ±1 to apply to the data before selection:

out = df.rolling(3, center=True).max().loc[selected_indexes]

Example:

np.random.seed(0)
df = pd.DataFrame({'col': np.random.randint(0, 10, 10)})
selected_indexes = [1, 2, 3, 5, 6, 8, 9]

print(df)

   col
0    5
1    0
2    3
3    3
4    7
5    9
6    3
7    5
8    2
9    4


out = df.loc[selected_indexes].rolling(3, center=True).max()
print(out)

   col
1  NaN
2  3.0
3  9.0
5  9.0
6  9.0
8  4.0
9  NaN

out2 = df.rolling(3, center=True).max().loc[selected_indexes]
print(out2)

   col
1  5.0
2  3.0
3  7.0
5  9.0
6  9.0
8  5.0
9  NaN