Home > Software engineering >  Python group by only neighbours
Python group by only neighbours

Time:11-03

I have a dataset consisting of measurements, and my dataframe looks like that:

ID  VAL BS  ERROR
0   0   0   0
1   1   0   1
2   1   0   1
3   0   0   0
4   11  10  1
5   10  10  0
6   12  10  2
7   11  10  1
8   9   10  -1
9   30  30  0
10  31  30  1
11  29  30  -1
12  10  10  0
13  9   10  -1
14  8   10  -2
15  11  10  1
16  0   0   0
17  1   0   1
18  2   0   2
19  9   10  -1
20  10  10  0

Where VAL is measured value, BS is base(round to nearest 10), and ERROR is a difference between measured value and base.

What I am trying to do is somewhat group by 'BASE' column, but only for neighborhood rows.

So, a resulting dataset will look like that (I will also want to calculate aggregate min and max error for a group, but I guess it will not be a problem)

It is important to keep the order of the groups for this case.

ID  BS  MIN MAX
0   0   0   1
1   10  -1  2
2   30  -1  1
3   10  -2  1
4   0   0   2
5   10  -1  0

CodePudding user response:

You can find the consecutive groups like this:

df['GROUP'] = (df['BS']!=df['BS'].shift()).cumsum()

Then you group by the GROUP column and aggregate min and max:

df.groupby(['GROUP', 'BS'])['ERROR'].agg(['min', 'max']).reset_index()

The output should be:

GROUP   BS  min max
0   1   0   0   1
1   2   10  -1  2
2   3   30  -1  1
3   4   10  -2  1
4   5   0   0   2
5   6   10  -1  0
  • Related