Home > Mobile >  pythonic way to make sure data is always decreasing within group in pandas
pythonic way to make sure data is always decreasing within group in pandas

Time:09-20

I have a dataset that looks like this

Id date     value
x1 01-01-22  46
x1 02-01-22  46
x1 03-01-22  45.8
....
x2 03-04-22  57
X2 03-04-22  62
....

The number in value should always decrease (or the stay the same) as time goes on. So the next observation for x2 would fail.

What's the most pythonic way to append a column of 1's and 0's if the value increases by say more than 3% (as there could be some measurement error). In R, I would just use dplyr & groupby and I was hoping for something as elegant in pandas.

Edit for clarity: The decrease must be within the id (eg per item).

CodePudding user response:

I think this should do it (edited I had the minus flipped) (edited again after clarification)

def f(gdf):
    return ((gdf.value - gdf.value.shift())/gdf.value.shift()) > .03
df['flag'] = df.groupby('Id').apply(f).values

CodePudding user response:

You have pct_change:

df['big_change'] = df.groupby('Id')['value'].pct_change().gt(.03).astype(int)

Output:

   Id      date  value  big_change
0  x1  01-01-22   46.0           0
1  x1  02-01-22   46.0           0
2  x1  03-01-22   45.8           0
3  x2  03-04-22   57.0           0
4  x2  03-04-22   62.0           1
  • Related