Home > Software engineering >  Rolling average on previous dates per group
Rolling average on previous dates per group

Time:10-21

I have the following dataset:

Name    Loc   Site  Date        Total
Alex    Italy   A   12.31.2020  30
Alex    Italy   B   12.31.2020  40
Alex    Italy   B   12.30.2020  100
Alex    Italy   A   12.30.2020  80
Alex    France  A   12.28.2020  10
Alex    France  B   12.28.2020  20
Alex    France  B   12.27.2020  10

I want to add per each row the average of total in the day before the date per Name, Loc and Date

This is the outcome I'm looking for:

Name    Loc     Site    Date       Total    Prv_Avg
Alex    Italy   A       12.31.2020  30        90
Alex    Italy   B       12.31.2020  40        90
Alex    Italy   B       12.30.2020  100      NULL
Alex    Italy   A       12.30.2020  80       NULL
Alex    France  A       12.28.2020  10        10
Alex    France  B       12.28.2020  20        10
Alex    France  B       12.27.2020  10       NULL

The Nulls are for rows where previous date couldn't be found.

I've tried rolling but got mixed up with the index.

CodePudding user response:

First aggregate mean per 3 columns, add one day to MultiIndex for match previous day and last use DataFrame.join:

df['Date'] = pd.to_datetime(df['Date'])


s = df.groupby(['Name','Loc','Date'])['Total'].mean().rename('Prv_Avg')
print (s)
Name  Loc     Date      
Alex  France  2020-12-27    10
              2020-12-28    15
      Italy   2020-12-30    90
              2020-12-31    35
Name: Prv_Avg, dtype: int64

s = s.rename(lambda x: x   pd.Timedelta('1 day'), level=2)
print (s)
Name  Loc     Date      
Alex  France  2020-12-28    10
              2020-12-29    15
      Italy   2020-12-31    90
              2021-01-01    35
Name: Prv_Avg, dtype: int64

df  = df.join(s, on=['Name','Loc','Date'])
print (df)
   Name     Loc Site       Date  Total  Prv_Avg
0  Alex   Italy    A 2020-12-31     30     90.0
1  Alex   Italy    B 2020-12-31     40     90.0
2  Alex   Italy    B 2020-12-30    100      NaN
3  Alex   Italy    A 2020-12-30     80      NaN
4  Alex  France    A 2020-12-28     10     10.0
5  Alex  France    B 2020-12-28     20     10.0
6  Alex  France    B 2020-12-27     10      NaN

CodePudding user response:

Another possible solution:

grp = ['Name', 'Loc', 'Date']
s = df.groupby(grp)['Total'].mean().shift().rename('Prv_Avg')

idx1 = s.index.get_level_values('Name').to_list()
idx2 = s.index.get_level_values('Loc').to_list()


df.merge(s.where((idx1 == np.roll(idx1, 1)) &
                 (idx2 == np.roll(idx2, 1))).reset_index(),
         on= grp)

Output:

   Name     Loc Site        Date  Total  Prv_Avg
0  Alex   Italy    A  12.31.2020     30     90.0
1  Alex   Italy    B  12.31.2020     40     90.0
2  Alex   Italy    B  12.30.2020    100      NaN
3  Alex   Italy    A  12.30.2020     80      NaN
4  Alex  France    A  12.28.2020     10     10.0
5  Alex  France    B  12.28.2020     20     10.0
6  Alex  France    B  12.27.2020     10      NaN
  • Related