I have the following df:
Index Address Date
0 0x06b 2021-12-02 16:03:09.332
1 0x04t 2021-12-03 16:03:09.332
2 0x12c 2021-12-03 16:03:09.332
3 0x3d5 2021-12-04 16:03:09.332
4 0x077 2021-12-04 16:03:09.332
5 0x998 2021-12-04 16:03:09.332
I want to calculate the difference in amount of rows (len()
of the column) between the most recent date (t
), which in this case is 2021-12-04 16:03:09.332
)and the previous date (t-1
) but also for any previous date (t-2, t-3, ..., t-n
).
In this case, the answer for t - (t-1) should be 1, because the most recent date has 3 rows and the secod most recent date has 2 rows. 3-2 = 1.
I have tried implementing the solution in this StackOverflow post, but it does not seem to work.
CodePudding user response:
I take you want to calculate the delta of the number of records per day vs the latest available date - would the following achieve what you need:
import pandas as pd
# Set up the test dataframe
df = pd.DataFrame({"Address":["2021-12-02", "2021-12-03","2021-12-03","2021-12-04", "2021-12-04", "2021-12-04"]})
df["Address"] = pd.to_datetime(df["Address"])
df2 = df.groupby("Address")[["Address"]].count().rename(columns={"Address": "count"})
# This will make sure it will calculate vs last available date
df2.at[max(df2.index),"count"] - df2
OUTPUT
count
Address
2021-12-02 2
2021-12-03 1
2021-12-04 0