I have the following example Pandas DataFrame
UserID Total Date
1 20 2019-01-01
1 18 2019-01-04
1 22 2019-01-05
1 16 2019-01-07
1 17 2019-01-09
1 26 2019-01-11
1 30 2019-01-12
1 28 2019-01-13
1 28 2019-01-15
1 28 2019-01-16
2 22 2019-01-06
2 11 2019-01-07
2 23 2019-01-09
2 14 2019-01-13
2 19 2019-01-14
2 29 2019-01-15
2 21 2019-01-16
2 22 2019-01-18
2 30 2019-01-22
2 16 2019-01-23
3 27 2019-01-01
3 13 2019-01-04
3 12 2019-01-05
3 27 2019-01-06
3 26 2019-01-09
3 26 2019-01-10
3 30 2019-01-11
3 19 2019-01-12
3 27 2019-01-13
3 29 2019-01-14
4 29 2019-01-07
4 12 2019-01-09
4 25 2019-01-10
4 11 2019-01-11
4 19 2019-01-13
4 20 2019-01-14
4 33 2019-01-15
4 24 2019-01-18
4 22 2019-01-19
4 24 2019-01-21
My goal is to add a column named TotalPrev10Days which is basically the sum of Total for previous 10 days (for each UserID)
I did a basic implementation using nested loops and comparing the current date with a timedelta.
Here's my code:
users = set(df.UserID) # get set of all unique user IDs
TotalPrev10Days = []
delta = timedelta(days=10) # 10 day time delta to subtract from each row date
for user in users: # looping over all user IDs
user_df = df[df["UserID"] == user] #creating dataframe that includes only current userID data
for row_index in user_df.index: #looping over each row from UserID dataframe
row_date = user_df["Date"][row_index]
row_date_minus_10 = row_date - delta #subtracting 10 days
sum_prev_10_days = user_df[(user_df["Date"] < row_date) & (user_df["Date"] >= row_date_minus_10)]["Total"].sum()
TotalPrev10Days.append(sum_prev_10_days) #appending total to a list
df["TotalPrev10Days"] = TotalPrev10Days #Assigning list to new DataFrame column
While it works perfectly, it's very slow for large datasets.
Is there a faster, more Pandas-native approach to this problem?
CodePudding user response:
IIUC, try:
df["TotalPrev10Days"] = df.groupby("UserID") \
.rolling("9D", on="Date") \
.sum() \
.shift() \
.fillna(0)["Total"] \
.droplevel(0)
>>> df
UserID Total Date TotalPrev10Days
0 1 20 2019-01-01 0.0
1 1 18 2019-01-04 20.0
2 1 22 2019-01-05 38.0
3 1 16 2019-01-07 60.0
4 1 17 2019-01-09 76.0
5 1 26 2019-01-11 93.0
6 1 30 2019-01-12 99.0
7 1 28 2019-01-13 129.0
8 1 28 2019-01-15 139.0
9 1 28 2019-01-16 145.0
10 2 22 2019-01-06 0.0
11 2 11 2019-01-07 22.0
12 2 23 2019-01-09 33.0
13 2 14 2019-01-13 56.0
14 2 19 2019-01-14 70.0
15 2 29 2019-01-15 89.0
16 2 21 2019-01-16 96.0
17 2 22 2019-01-18 106.0
18 2 30 2019-01-22 105.0
19 2 16 2019-01-23 121.0
20 3 27 2019-01-01 0.0
21 3 13 2019-01-04 27.0
22 3 12 2019-01-05 40.0
23 3 27 2019-01-06 52.0
24 3 26 2019-01-09 79.0
25 3 26 2019-01-10 105.0
26 3 30 2019-01-11 104.0
27 3 19 2019-01-12 134.0
28 3 27 2019-01-13 153.0
29 3 29 2019-01-14 167.0
30 4 29 2019-01-07 0.0
31 4 12 2019-01-09 29.0
32 4 25 2019-01-10 41.0
33 4 11 2019-01-11 66.0
34 4 19 2019-01-13 77.0
35 4 20 2019-01-14 96.0
36 4 33 2019-01-15 116.0
37 4 24 2019-01-18 149.0
38 4 22 2019-01-19 132.0
39 4 24 2019-01-21 129.0