I am trying to implement a simple function to calculate the rate of change of a given y
value using pandas.
my input DataFrame is the following:
data = [
["id1", "2018-05-14", 9998, 10],
["id1", "2019-05-14", 18000, 5],
["id1", "2020-05-14", 22000, 3],
["id2", "2018-07-3", 458756, 24],
["id2", "2019-02-1", 822565, 3],
["id2", "2020-05-14", 922565, 1],
]
df = pd.DataFrame(data, columns=["id", "date", "x", "y"])
The expected output is as follow:
data = [
["id1", "2018-05-14", 9998, 10, np.nan],
["id1", "2019-05-14", 18000, 5, 1600.4], # (18000-9998) / (10-5)
["id1", "2020-05-14", 22000, 3, 2000], # (22000-18000) (5-3)
["id2", "2018-07-3", 458756, 24, np.nan],
["id2", "2019-02-1", 822565, 3, 17324.24], # (822565-458756) / (24-3)
["id2", "2020-05-14", 922565, 1, 50000], # (922565-822565) / (3-1)
]
df_expected = pd.DataFrame(data, columns=["id", "date", "x", "y", "rate"])
I think I need to use the rolling function but I am not sure how exactly it must be used
def compute_rate(vals):
return vals["x"] / vals["y"]
df.sort_values(by=["id", "date"], inplace=True)
grp = df.groupby(by=["id"])
df["rate"] = grp.rolling(1).apply(
lambda x: x["x"] / x["y"]
)
CodePudding user response:
Most, efficient, compute the diff of both rows with groupby
diff
, then compute the ratio with help of eval
(or assign
if you want even more efficiency):
df['rate'] = (
df.sort_values(by=['id', 'date'])
.groupby('id')[['x', 'y']]
.diff().eval('-x/y')
)
output:
id date x y rate
0 id1 2018-05-14 9998 10 NaN
1 id1 2019-05-14 18000 5 1600.400000
2 id1 2020-05-14 22000 3 2000.000000
3 id2 2018-07-3 458756 24 NaN
4 id2 2019-02-1 822565 3 17324.238095
5 id2 2020-05-14 922565 1 50000.000000