Home > Enterprise >  Flag rows with max value in a "moving subset" (rolling window?)
Flag rows with max value in a "moving subset" (rolling window?)

Time:11-10

If I have the dataframe:

   '|    |   time_index |   values |
    |---:|-------------:|---------:|
    |  0 |            1 |       21 |
    |  1 |            2 |        5 |
    |  2 |            3 |       25 |
    |  3 |            4 |        0 |
    |  4 |            5 |        4 |
    |  5 |            6 |       13 |
    |  6 |            7 |       20 |
    |  7 |            8 |        2 |
    |  8 |            9 |       15 |
    |  9 |           10 |       21 |'

I want to take the all the subsets of 3, in increments of one, so the first operation takes index [0,1,2], second iteration [1,2,3]. This logic implemented to the column values I would like to check if the value in the middle, is the max of the subset and flag it in another column.

Iterations:

  1. Values: [21,5,25], max(values) == 5? False => ignore.
  2. Values: [5,25,0], max(values) = 25? True => add flag in new column.

I have the feeling that this has to do with a rolling window but I am not sure how to go about it.

CodePudding user response:

To make rolling window calculations use the rolling method. Then you can apply the logic to each group using agg with a custom function.

# val.iat[1] stands for the middle value of each group
df['is_max'] = (
    df['values'].rolling(window=3, center=True)
                .agg(lambda vals: vals.iat[1] == vals.max())
                .astype('boolean')
)

>>> df

   time_index  values  is_max
0           1      21    <NA>
1           2       5   False
2           3      25    True
3           4       0   False
4           5       4   False
5           6      13   False
6           7      20    True
7           8       2   False
8           9      15   False
9          10      21    <NA>
  • Related