I have a certain series on in dataframe.
df=pd.DataFrame()
df['yMax'] = [127, 300, 300, 322, 322, 322, 322, 344, 344, 344, 366, 366, 367, 367, 367, 388, 388, 388, 388, 389, 389, 402, 403, 403, 403]
For values very close to one another, say, with a difference of 1, I would like to obliterate that difference to yield the same number, either by adding or subtracting by 1.
So, for example, the resultant list would become:
df['yMax'] = [127, 300, 300, 322, 322, 322, 322, 344, 344, 344, 367, 367, 367, 367, 367, 389, 389, 389, 389, 389, 389, 403, 403, 403, 403]
I know we can easily find the difference between adjacent values with df.diff()
.
0 NaN
1 173.0
2 0.0
3 22.0
4 0.0
5 0.0
8 0.0
6 22.0
7 0.0
9 0.0
10 22.0
11 0.0
12 1.0
13 0.0
14 0.0
15 21.0
16 0.0
17 0.0
20 0.0
18 1.0
19 0.0
21 13.0
22 1.0
23 0.0
24 0.0
Name: yMax, dtype: float64
But how should I perform the transformation?
CodePudding user response:
import pandas as pd
df=pd.DataFrame({'yMax':[127, 300, 300, 322, 322, 322, 322, 344, 344, 344, 366, 366, 367, 367, 367, 388, 388, 388, 388, 389, 389, 402, 403, 403, 403]})
Where there is a difference of 1 in consecutive, move the immediate consecutive up. Group by the original numbers picking max value in the adjusted column. Code below
df =df.assign(new_yMax=np.where(df['yMax'].diff(-1)==-1, df['yMax'].shift(-1),df['yMax']))
df =df.assign(new_yMax=df.groupby('yMax')['new_yMax'].transform('max'))
df