Home > Back-end >  How to conditionally aggregate values of previous rows of Pandas DataFrame?
How to conditionally aggregate values of previous rows of Pandas DataFrame?

Time:09-22

I have the following example Pandas DataFrame

UserID Total Date
1      20    2019-01-01
1      18    2019-01-04
1      22    2019-01-05
1      16    2019-01-07
1      17    2019-01-09
1      26    2019-01-11
1      30    2019-01-12
1      28    2019-01-13
1      28    2019-01-15
1      28    2019-01-16
2      22    2019-01-06
2      11    2019-01-07
2      23    2019-01-09
2      14    2019-01-13
2      19    2019-01-14
2      29    2019-01-15
2      21    2019-01-16
2      22    2019-01-18
2      30    2019-01-22
2      16    2019-01-23
3      27    2019-01-01
3      13    2019-01-04
3      12    2019-01-05
3      27    2019-01-06
3      26    2019-01-09
3      26    2019-01-10
3      30    2019-01-11
3      19    2019-01-12
3      27    2019-01-13
3      29    2019-01-14
4      29    2019-01-07
4      12    2019-01-09
4      25    2019-01-10
4      11    2019-01-11
4      19    2019-01-13
4      20    2019-01-14
4      33    2019-01-15
4      24    2019-01-18
4      22    2019-01-19
4      24    2019-01-21

My goal is to add a column named TotalPrev10Days which is basically the sum of Total for previous 10 days (for each UserID)

I did a basic implementation using nested loops and comparing the current date with a timedelta.

Here's my code:

users = set(df.UserID) # get set of all unique user IDs

TotalPrev10Days = []
delta = timedelta(days=10) # 10 day time delta to subtract from each row date

for user in users: # looping over all user IDs
    user_df = df[df["UserID"] == user] #creating dataframe that includes only current userID data
    for row_index in user_df.index: #looping over each row from UserID dataframe
        row_date = user_df["Date"][row_index]
        row_date_minus_10 = row_date - delta #subtracting 10 days
        sum_prev_10_days = user_df[(user_df["Date"] < row_date) & (user_df["Date"] >= row_date_minus_10)]["Total"].sum()
        TotalPrev10Days.append(sum_prev_10_days) #appending total to a list

df["TotalPrev10Days"] = TotalPrev10Days #Assigning list to new DataFrame column

While it works perfectly, it's very slow for large datasets.

Is there a faster, more Pandas-native approach to this problem?

CodePudding user response:

IIUC, try:

df["TotalPrev10Days"] = df.groupby("UserID") \
                          .rolling("9D", on="Date") \
                          .sum() \
                          .shift() \
                          .fillna(0)["Total"] \
                          .droplevel(0)

>>> df
    UserID  Total       Date  TotalPrev10Days
0        1     20 2019-01-01              0.0
1        1     18 2019-01-04             20.0
2        1     22 2019-01-05             38.0
3        1     16 2019-01-07             60.0
4        1     17 2019-01-09             76.0
5        1     26 2019-01-11             93.0
6        1     30 2019-01-12             99.0
7        1     28 2019-01-13            129.0
8        1     28 2019-01-15            139.0
9        1     28 2019-01-16            145.0
10       2     22 2019-01-06              0.0
11       2     11 2019-01-07             22.0
12       2     23 2019-01-09             33.0
13       2     14 2019-01-13             56.0
14       2     19 2019-01-14             70.0
15       2     29 2019-01-15             89.0
16       2     21 2019-01-16             96.0
17       2     22 2019-01-18            106.0
18       2     30 2019-01-22            105.0
19       2     16 2019-01-23            121.0
20       3     27 2019-01-01              0.0
21       3     13 2019-01-04             27.0
22       3     12 2019-01-05             40.0
23       3     27 2019-01-06             52.0
24       3     26 2019-01-09             79.0
25       3     26 2019-01-10            105.0
26       3     30 2019-01-11            104.0
27       3     19 2019-01-12            134.0
28       3     27 2019-01-13            153.0
29       3     29 2019-01-14            167.0
30       4     29 2019-01-07              0.0
31       4     12 2019-01-09             29.0
32       4     25 2019-01-10             41.0
33       4     11 2019-01-11             66.0
34       4     19 2019-01-13             77.0
35       4     20 2019-01-14             96.0
36       4     33 2019-01-15            116.0
37       4     24 2019-01-18            149.0
38       4     22 2019-01-19            132.0
39       4     24 2019-01-21            129.0
  • Related