Cumulative rate of change for successive rows per group, using 2 columns (x/y)-CodePudding

I am trying to implement a simple function to calculate the rate of change of a given y value using pandas.

my input DataFrame is the following:

data = [
    ["id1", "2018-05-14", 9998, 10],
    ["id1", "2019-05-14", 18000, 5],
    ["id1", "2020-05-14", 22000, 3],
    ["id2", "2018-07-3", 458756, 24],
    ["id2", "2019-02-1", 822565, 3],
    ["id2", "2020-05-14", 922565, 1],
]

df = pd.DataFrame(data, columns=["id", "date", "x", "y"])

The expected output is as follow:

data = [
    ["id1", "2018-05-14", 9998, 10, np.nan],
    ["id1", "2019-05-14", 18000, 5, 1600.4],  # (18000-9998) / (10-5)
    ["id1", "2020-05-14", 22000, 3, 2000],  # (22000-18000) (5-3)
    ["id2", "2018-07-3", 458756, 24, np.nan],
    ["id2", "2019-02-1", 822565, 3, 17324.24],  # (822565-458756) / (24-3)
    ["id2", "2020-05-14", 922565, 1, 50000],  # (922565-822565) / (3-1)
]

df_expected = pd.DataFrame(data, columns=["id", "date", "x", "y", "rate"])

I think I need to use the rolling function but I am not sure how exactly it must be used

def compute_rate(vals):
    return vals["x"] / vals["y"]

df.sort_values(by=["id", "date"], inplace=True)
grp = df.groupby(by=["id"])

df["rate"] = grp.rolling(1).apply(
     lambda x: x["x"] / x["y"]
)

CodePudding user response：

Most, efficient, compute the diff of both rows with groupby diff, then compute the ratio with help of eval (or assign if you want even more efficiency):

df['rate'] = (
 df.sort_values(by=['id', 'date'])
   .groupby('id')[['x', 'y']]
   .diff().eval('-x/y')
)

output:

    id        date       x   y          rate
0  id1  2018-05-14    9998  10           NaN
1  id1  2019-05-14   18000   5   1600.400000
2  id1  2020-05-14   22000   3   2000.000000
3  id2   2018-07-3  458756  24           NaN
4  id2   2019-02-1  822565   3  17324.238095
5  id2  2020-05-14  922565   1  50000.000000