Home > Enterprise >  calculate 12m rolling while categorical name same
calculate 12m rolling while categorical name same

Time:06-18

I have following df. It sorted by brand name and date: DF has several brand names.

Brand       Date    sales
0    Brand1 2020-01-01  104.341655
1    Brand1 2020-02-01   99.607967
2    Brand1 2020-03-01  117.377918
3    Brand1 2020-04-01   93.774992

I want to calculate new column 12m rolling sales for this I use:

I use test['12m_rolling'] = test['sales'].rolling(12, 12).sum()

I`m happy with the way it works but I want calculation to be made while brand name is same: For example when brand2 appears for the first 11 times 12m_rolling will be empty

example of output

Brand       Date    sales      12m_rolling
0   Brand1 2020-01-01  104.341655          NaN
1   Brand1 2020-02-01   99.607967          NaN
2   Brand1 2020-03-01  117.377918          NaN
3   Brand1 2020-04-01   93.774992          NaN
4   Brand1 2020-05-01  110.327912          NaN
5   Brand1 2020-06-01   99.134345          NaN
6   Brand1 2020-07-01  151.705121          NaN
7   Brand1 2020-08-01  110.874937          NaN
8   Brand1 2020-09-01   95.949589          NaN
9   Brand1 2020-10-01  130.499255          NaN
10  Brand1 2020-11-01  139.167595          NaN
11  Brand1 2020-12-01  116.256460  1369.017745
12  Brand1 2021-01-01  154.617151  1419.293241
13  Brand1 2021-02-01  116.215803  1435.901077
14  Brand1 2021-03-01   99.905458  1418.428617
15  Brand1 2021-04-01  137.248631  1461.902256
28   Brand2 2020-01-01   59.615385   NaN
29   Brand2 2020-02-01   60.538462   NaN
30   Brand2 2020-03-01   71.153846   NaN
31   Brand2 2020-04-01   49.230769   NaN
32   Brand2 2020-05-01   68.307692   NaN
33   Brand2 2020-06-01   59.384615   NaN
34   Brand2 2020-07-01   55.230769   NaN
35   Brand2 2020-08-01   64.076923    NaN
36   Brand2 2020-09-01   54.692308    NaN
37   Brand2 2020-10-01   55.538462   NaN
38   Brand2 2020-11-01   57.000000    NaN
39   Brand2 2020-12-01   55.769231   710.538462
40   Brand2 2021-01-01   54.076923   705.000000
41   Brand2 2021-02-01   54.923077   699.384615
42   Brand2 2021-03-01   54.769231   683.000000
43   Brand2 2021-04-01   54.615385   688.384615
44   Brand2 2021-05-01   53.538462   673.615385

CodePudding user response:

Nothing complicated, just group by brand:

df['rolling_12'] = df.groupby('Brand',as_index=False)['sales'].rolling(12).sum()['sales']
    Brand        Date       sales   rolling_12
0   Brand1  2020-01-01  104.341655          NaN
1   Brand1  2020-02-01   99.607967          NaN
2   Brand1  2020-03-01  117.377918          NaN
3   Brand1  2020-04-01   93.774992          NaN
4   Brand1  2020-05-01  110.327912          NaN
5   Brand1  2020-06-01   99.134345          NaN
6   Brand1  2020-07-01  151.705121          NaN
7   Brand1  2020-08-01  110.874937          NaN
8   Brand1  2020-09-01   95.949589          NaN
9   Brand1  2020-10-01  130.499255          NaN
10  Brand1  2020-11-01  139.167595          NaN
11  Brand1  2020-12-01  116.256460  1369.017746
12  Brand1  2021-01-01  154.617151  1419.293242
13  Brand1  2021-02-01  116.215803  1435.901078
14  Brand1  2021-03-01   99.905458  1418.428618
15  Brand1  2021-04-01  137.248631  1461.902257
28  Brand2  2020-01-01   59.615385          NaN
29  Brand2  2020-02-01   60.538462          NaN
30  Brand2  2020-03-01   71.153846          NaN
31  Brand2  2020-04-01   49.230769          NaN
32  Brand2  2020-05-01   68.307692          NaN
33  Brand2  2020-06-01   59.384615          NaN
34  Brand2  2020-07-01   55.230769          NaN
35  Brand2  2020-08-01   64.076923          NaN
36  Brand2  2020-09-01   54.692308          NaN
37  Brand2  2020-10-01   55.538462          NaN
38  Brand2  2020-11-01   57.000000          NaN
39  Brand2  2020-12-01   55.769231   710.538462
40  Brand2  2021-01-01   54.076923   705.000000
41  Brand2  2021-02-01   54.923077   699.384615
42  Brand2  2021-03-01   54.769231   683.000000
43  Brand2  2021-04-01   54.615385   688.384616
44  Brand2  2021-05-01   53.538462   673.615386
  • Related