Home > database >  How to calculate difference between amount of rows for most recent date and second most recent date
How to calculate difference between amount of rows for most recent date and second most recent date

Time:12-06

I have the following df:

Index     Address     Date     
0  0x06b  2021-12-02  16:03:09.332
1  0x04t  2021-12-03  16:03:09.332
2  0x12c  2021-12-03  16:03:09.332
3  0x3d5  2021-12-04  16:03:09.332
4  0x077  2021-12-04  16:03:09.332
5  0x998  2021-12-04  16:03:09.332

I want to calculate the difference in amount of rows (len() of the column) between the most recent date (t), which in this case is 2021-12-04 16:03:09.332)and the previous date (t-1) but also for any previous date (t-2, t-3, ..., t-n).

In this case, the answer for t - (t-1) should be 1, because the most recent date has 3 rows and the secod most recent date has 2 rows. 3-2 = 1.

I have tried implementing the solution in this StackOverflow post, but it does not seem to work.

CodePudding user response:

I take you want to calculate the delta of the number of records per day vs the latest available date - would the following achieve what you need:

import pandas as pd
# Set up the test dataframe
df = pd.DataFrame({"Address":["2021-12-02", "2021-12-03","2021-12-03","2021-12-04", "2021-12-04", "2021-12-04"]})
df["Address"] = pd.to_datetime(df["Address"])

df2 = df.groupby("Address")[["Address"]].count().rename(columns={"Address": "count"})
# This will make sure it will calculate vs last available date
df2.at[max(df2.index),"count"] - df2

OUTPUT

            count
Address
2021-12-02      2
2021-12-03      1
2021-12-04      0
  • Related