Find sum of values of a column spread over different months in Python-CodePudding

I have a table that looks like

A	B	C
2017	9	65
2017	10	72
2017	11	88
2017	12	97
2018	1	85
2018	2	67
2018	3	76
2018	4	51
2018	5	69
2018	6	97
2018	7	101
2018	8	22
2019	1	56
2019	2	34
2019	3	71
2019	4	122
2019	5	167
2019	6	34
2019	7	17
2019	8	99
2019	9	20
2019	10	26
2019	11	39
2019	12	30
2020	1	56
2020	2	34
2020	3	71
2020	4	122
2020	5	167
2020	6	34
2020	7	17
2020	8	99
2020	9	20
2020	10	26
2020	11	39
2020	12	30
2021	1	56
2021	2	34
2021	3	71
2021	4	122
2021	5	167
2021	6	34
2021	7	17
2021	8	99
2021	9	20
2021	10	26
2021	11	39
2021	12	30

Now what I want is :

A	B	C	D
2017	9	65	890
2017	10	72	890
2017	11	88	890
2017	12	97	890
2018	1	85	890
2018	2	67	890
2018	3	76	890
2018	4	51	890
2018	5	69	890
2018	6	97	890
2018	7	101	890
2018	8	22	890
2019	1	56	715
2019	2	34	715
2019	3	71	715
2019	4	122	715
2019	5	167	715
2019	6	34	715
2019	7	17	715
2019	8	99	715
2019	9	20	715
2019	10	26	715
2019	11	39	715
2019	12	30	715
2020	1	56	715
2020	2	34	715
2020	3	71	715
2020	4	122	715
2020	5	167	715
2020	6	34	715
2020	7	17	715
2020	8	99	715
2020	9	20	715
2020	10	26	715
2020	11	39	715
2020	12	30	715
2021	1	56	715
2021	2	34	715
2021	3	71	715
2021	4	122	715
2021	5	167	715
2021	6	34	715
2021	7	17	715
2021	8	99	715
2021	9	20	715
2021	10	26	715
2021	11	39	715
2021	12	30	715

Here 890 is the sum of all the values from 9,2017 through 8,2018 and 715 is the sum of all values from 1,2019 through 12,2019 and similarly 715 is the sum of all values from 1,2020 through 12,2020 and similarly 715 is the sum of all values from 1,2021 through 12,2021. For ease of calculation the numbers in column C have been taken the same i.e, (56,34,71,122,167,34,17,99,20,26,39,30) for each of 2019, 2020 and 2021. These numbers may vary for each of the years and subsequently their sums. That is we could have values like (67,87,99,100,76,11,23,44,56,78,87,5) for 2020 and (12,13,14,15,16,17,18,19,20,21,22,23) for 2021 for the months (1,2,3,4,5,6,7,8,,10,11,12) subsequently.

Now my efforts:

count_months_in_each_year = data.groupby('CALENDAR_YEAR').agg({'CALMONTH':'count'})

count_months_in_each_year.reset_index(inplace = True)

count_months_in_each_year.rename({'CALMONTH':'Count_of_Months'}, axis =1, inplace = True)

data = pd.merge(data, count_months_in_each_year, on = 'CALENDAR_YEAR', how = 'left', indicator = True )

data.drop(columns = ['_merge'], axis =1 , inplace = True)

The dataframe after this looks like this:

Now how to get the sum of the values especially in case when I have to consider 9,2017 through 8,2018 although I have the count. Now based on this what logic can be driven to generalize the code in order to get the result.

I also tried this :

####Compute total number of records - number of records which have count of months < 12

number_ofless_than_12_records =  data.shape[0] - data[data['Count_of_Months']==12].shape[0]

#number_ofless_than_12_records = 144.
#Total records = 576

Can we make use of this somehow?

CodePudding user response：

I think what you are looking for is making groups of 12 rows and transform with the group sum:

df['D'] = df.groupby(df.index // 12)['C'].transform('sum')

       A   B    C    D
0   2017   9   65  890
1   2017  10   72  890
2   2017  11   88  890
3   2017  12   97  890
4   2018   1   85  890
5   2018   2   67  890
6   2018   3   76  890
7   2018   4   51  890
8   2018   5   69  890
9   2018   6   97  890
10  2018   7  101  890
11  2018   8   22  890
12  2019   1   56  715
13  2019   2   34  715
14  2019   3   71  715
15  2019   4  122  715
16  2019   5  167  715
17  2019   6   34  715
18  2019   7   17  715
19  2019   8   99  715
20  2019   9   20  715
21  2019  10   26  715
22  2019  11   39  715
23  2019  12   30  715
24  2020   1   56  715
25  2020   2   34  715
26  2020   3   71  715
27  2020   4  122  715
28  2020   5  167  715
29  2020   6   34  715
30  2020   7   17  715
31  2020   8   99  715
32  2020   9   20  715
33  2020  10   26  715
34  2020  11   39  715
35  2020  12   30  715
36  2021   1   56  715
37  2021   2   34  715
38  2021   3   71  715
39  2021   4  122  715
40  2021   5  167  715
41  2021   6   34  715
42  2021   7   17  715
43  2021   8   99  715
44  2021   9   20  715
45  2021  10   26  715
46  2021  11   39  715
47  2021  12   30  715

CodePudding user response：

You can use pandas rolling window function https://pandas.pydata.org/docs/user_guide/window.html

df['D'] = df['C'].rolling(window=12).sum()

This will calculate the sum of the current month and 11 rows back. But it will fill with NaN values in the beginning, until there are enough months to look back.

So we can shift up 11 rows to get the wanted result.

df['D'] = df['D'].shift(-11)

And if you want don't want any NaNs at the end, you can interpolate or pad it out.

df['D'] = df['D'].interpolate()