I have a dataset that looks like this
Id date value
x1 01-01-22 46
x1 02-01-22 46
x1 03-01-22 45.8
....
x2 03-04-22 57
X2 03-04-22 62
....
The number in value
should always decrease (or the stay the same) as time goes on. So the next observation for x2 would fail.
What's the most pythonic way to append a column of 1's and 0's if the value increases by say more than 3% (as there could be some measurement error). In R, I would just use dplyr & groupby and I was hoping for something as elegant in pandas.
Edit for clarity: The decrease must be within the id (eg per item).
CodePudding user response:
I think this should do it (edited I had the minus flipped) (edited again after clarification)
def f(gdf):
return ((gdf.value - gdf.value.shift())/gdf.value.shift()) > .03
df['flag'] = df.groupby('Id').apply(f).values
CodePudding user response:
You have pct_change
:
df['big_change'] = df.groupby('Id')['value'].pct_change().gt(.03).astype(int)
Output:
Id date value big_change
0 x1 01-01-22 46.0 0
1 x1 02-01-22 46.0 0
2 x1 03-01-22 45.8 0
3 x2 03-04-22 57.0 0
4 x2 03-04-22 62.0 1