I have a dataset consisting of measurements, and my dataframe looks like that:
ID VAL BS ERROR
0 0 0 0
1 1 0 1
2 1 0 1
3 0 0 0
4 11 10 1
5 10 10 0
6 12 10 2
7 11 10 1
8 9 10 -1
9 30 30 0
10 31 30 1
11 29 30 -1
12 10 10 0
13 9 10 -1
14 8 10 -2
15 11 10 1
16 0 0 0
17 1 0 1
18 2 0 2
19 9 10 -1
20 10 10 0
Where VAL
is measured value, BS
is base(round to nearest 10), and ERROR
is a difference between measured value and base.
What I am trying to do is somewhat group by 'BASE' column, but only for neighborhood rows.
So, a resulting dataset will look like that (I will also want to calculate aggregate min and max error for a group, but I guess it will not be a problem)
It is important to keep the order of the groups for this case.
ID BS MIN MAX
0 0 0 1
1 10 -1 2
2 30 -1 1
3 10 -2 1
4 0 0 2
5 10 -1 0
CodePudding user response:
You can find the consecutive groups like this:
df['GROUP'] = (df['BS']!=df['BS'].shift()).cumsum()
Then you group by the GROUP
column and aggregate min and max:
df.groupby(['GROUP', 'BS'])['ERROR'].agg(['min', 'max']).reset_index()
The output should be:
GROUP BS min max
0 1 0 0 1
1 2 10 -1 2
2 3 30 -1 1
3 4 10 -2 1
4 5 0 0 2
5 6 10 -1 0