Home > OS >  Using groupby's aggregation to populate a new column
Using groupby's aggregation to populate a new column

Time:11-22

Given this dataframe df:

date          type     target
2021-01-01    0        5
2021-01-01    0        6
2021-01-01    1        4
2021-01-01    1        2
2021-01-02    0        5
2021-01-02    1        3
2021-01-02    1        7
2021-01-02    0        1
2021-01-03    0        2
2021-01-03    1        5

I want to create a new column that contains yesterday's target mean by type.

For example, for the 5th row (date=2021-01-02, type=0) the new column's value would be 5.5, as the mean of the target for the previous day, 2021-01-01 for type=0 is (5 6)/2.

I can easily obtain the mean of target grouping by date and type as:

means = df.groupby(['date', 'type'])['target'].mean()

But I don't know how to create a new column on the original dataframe with the desired data, which should look as follows:

date          type     target      mean
2021-01-01    0        5           NaN (or null or whatever)
2021-01-01    0        6           NaN
2021-01-01    1        4           NaN
2021-01-01    1        2           NaN
2021-01-02    0        5           5.5
2021-01-02    1        3           3
2021-01-02    1        7           3
2021-01-02    0        2           5.5
2021-01-03    0        2           3.5
2021-01-03    1        5           5

CodePudding user response:

Ensure your date column is datetime, and add another temporary column to df of the date the day before:

 df['date'] = pd.to_datetime(df['date'])
 df['yesterday'] = df['date'] - pd.Timedelta('1 day')

Then use your means groupby, with as_index=False, and left merge that onto the original df on yesterday/date and type columns, and select the desired columns:

means = df.groupby(['date', 'type'], as_index=False)['target'].mean()
df.merge(means, left_on=['yesterday', 'type'], right_on=['date', 'type'], 
how='left', suffixes=[None, ' mean'])[['date', 'type', 'target', 'target mean']]

Output:

        date  type  target  target mean
0 2021-01-01     0       5          NaN
1 2021-01-01     0       6          NaN
2 2021-01-01     1       4          NaN
3 2021-01-01     1       2          NaN
4 2021-01-02     0       5          5.5
5 2021-01-02     1       3          3.0
6 2021-01-02     1       7          3.0
7 2021-01-02     0       1          5.5
8 2021-01-03     0       2          3.0
9 2021-01-03     1       5          5.0

CodePudding user response:

Idea is add one day to first level of MultiIndex Series by Timedelta, so possible add new column by DataFrame.join:

df['date'] = pd.to_datetime(df['date'])

s1 = df.groupby(['date', 'type'])['target'].mean()
s2 = s1.rename(index=lambda x: x   pd.Timedelta(days=1), level=0)

df = df.join(s2.rename('mean'), on=['date','type'])
print (df)
        date  type  target  mean
0 2021-01-01     0       5   NaN
1 2021-01-01     0       6   NaN
2 2021-01-01     1       4   NaN
3 2021-01-01     1       2   NaN
4 2021-01-02     0       5   5.5
5 2021-01-02     1       3   3.0
6 2021-01-02     1       7   3.0
7 2021-01-02     0       1   5.5
8 2021-01-03     0       2   3.0
9 2021-01-03     1       5   5.0

Another solution:

df['date'] = pd.to_datetime(df['date'])

s1 = df.groupby([df['date']   pd.Timedelta(days=1), 'type'])['target'].mean()
df = df.join(s1.rename('mean'), on=['date','type'])
print (df)
        date  type  target  mean
0 2021-01-01     0       5   NaN
1 2021-01-01     0       6   NaN
2 2021-01-01     1       4   NaN
3 2021-01-01     1       2   NaN
4 2021-01-02     0       5   5.5
5 2021-01-02     1       3   3.0
6 2021-01-02     1       7   3.0
7 2021-01-02     0       1   5.5
8 2021-01-03     0       2   3.0
9 2021-01-03     1       5   5.0

CodePudding user response:

A small edition on @Emi OB' s answer

means = df.groupby(["date", "type"], as_index=False)["target"].mean()
means["mean"] = means.pop("target").shift(2)
df = df.merge(means, how="left", on=["date", "type"])

    date    type    target  mean
0   2021-01-01  0   5   NaN
1   2021-01-01  0   6   NaN
2   2021-01-01  1   4   NaN
3   2021-01-01  1   2   NaN
4   2021-01-02  0   5   5.5
5   2021-01-02  1   3   3.0
6   2021-01-02  1   7   3.0
7   2021-01-02  0   2   5.5
8   2021-01-03  0   2   3.5
9   2021-01-03  1   5   5.0
  • Related