Home > front end >  Find sum of values of a column spread over different months in Python
Find sum of values of a column spread over different months in Python

Time:05-28

I have a table that looks like

A B C
2017 9 65
2017 10 72
2017 11 88
2017 12 97
2018 1 85
2018 2 67
2018 3 76
2018 4 51
2018 5 69
2018 6 97
2018 7 101
2018 8 22
2019 1 56
2019 2 34
2019 3 71
2019 4 122
2019 5 167
2019 6 34
2019 7 17
2019 8 99
2019 9 20
2019 10 26
2019 11 39
2019 12 30
2020 1 56
2020 2 34
2020 3 71
2020 4 122
2020 5 167
2020 6 34
2020 7 17
2020 8 99
2020 9 20
2020 10 26
2020 11 39
2020 12 30
2021 1 56
2021 2 34
2021 3 71
2021 4 122
2021 5 167
2021 6 34
2021 7 17
2021 8 99
2021 9 20
2021 10 26
2021 11 39
2021 12 30

Now what I want is :

A B C D
2017 9 65 890
2017 10 72 890
2017 11 88 890
2017 12 97 890
2018 1 85 890
2018 2 67 890
2018 3 76 890
2018 4 51 890
2018 5 69 890
2018 6 97 890
2018 7 101 890
2018 8 22 890
2019 1 56 715
2019 2 34 715
2019 3 71 715
2019 4 122 715
2019 5 167 715
2019 6 34 715
2019 7 17 715
2019 8 99 715
2019 9 20 715
2019 10 26 715
2019 11 39 715
2019 12 30 715
2020 1 56 715
2020 2 34 715
2020 3 71 715
2020 4 122 715
2020 5 167 715
2020 6 34 715
2020 7 17 715
2020 8 99 715
2020 9 20 715
2020 10 26 715
2020 11 39 715
2020 12 30 715
2021 1 56 715
2021 2 34 715
2021 3 71 715
2021 4 122 715
2021 5 167 715
2021 6 34 715
2021 7 17 715
2021 8 99 715
2021 9 20 715
2021 10 26 715
2021 11 39 715
2021 12 30 715

Here 890 is the sum of all the values from 9,2017 through 8,2018 and 715 is the sum of all values from 1,2019 through 12,2019 and similarly 715 is the sum of all values from 1,2020 through 12,2020 and similarly 715 is the sum of all values from 1,2021 through 12,2021. For ease of calculation the numbers in column C have been taken the same i.e, (56,34,71,122,167,34,17,99,20,26,39,30) for each of 2019, 2020 and 2021. These numbers may vary for each of the years and subsequently their sums. That is we could have values like (67,87,99,100,76,11,23,44,56,78,87,5) for 2020 and (12,13,14,15,16,17,18,19,20,21,22,23) for 2021 for the months (1,2,3,4,5,6,7,8,,10,11,12) subsequently.

Now my efforts:

count_months_in_each_year = data.groupby('CALENDAR_YEAR').agg({'CALMONTH':'count'})

count_months_in_each_year.reset_index(inplace = True)

count_months_in_each_year.rename({'CALMONTH':'Count_of_Months'}, axis =1, inplace = True)

data = pd.merge(data, count_months_in_each_year, on = 'CALENDAR_YEAR', how = 'left', indicator = True )

data.drop(columns = ['_merge'], axis =1 , inplace = True)

The dataframe after this looks like this:

A picture of the dataset

Now how to get the sum of the values especially in case when I have to consider 9,2017 through 8,2018 although I have the count. Now based on this what logic can be driven to generalize the code in order to get the result.

I also tried this :

####Compute total number of records - number of records which have count of months < 12

number_ofless_than_12_records =  data.shape[0] - data[data['Count_of_Months']==12].shape[0]

#number_ofless_than_12_records = 144.
#Total records = 576

Can we make use of this somehow?

CodePudding user response:

I think what you are looking for is making groups of 12 rows and transform with the group sum:

df['D'] = df.groupby(df.index // 12)['C'].transform('sum')
       A   B    C    D
0   2017   9   65  890
1   2017  10   72  890
2   2017  11   88  890
3   2017  12   97  890
4   2018   1   85  890
5   2018   2   67  890
6   2018   3   76  890
7   2018   4   51  890
8   2018   5   69  890
9   2018   6   97  890
10  2018   7  101  890
11  2018   8   22  890
12  2019   1   56  715
13  2019   2   34  715
14  2019   3   71  715
15  2019   4  122  715
16  2019   5  167  715
17  2019   6   34  715
18  2019   7   17  715
19  2019   8   99  715
20  2019   9   20  715
21  2019  10   26  715
22  2019  11   39  715
23  2019  12   30  715
24  2020   1   56  715
25  2020   2   34  715
26  2020   3   71  715
27  2020   4  122  715
28  2020   5  167  715
29  2020   6   34  715
30  2020   7   17  715
31  2020   8   99  715
32  2020   9   20  715
33  2020  10   26  715
34  2020  11   39  715
35  2020  12   30  715
36  2021   1   56  715
37  2021   2   34  715
38  2021   3   71  715
39  2021   4  122  715
40  2021   5  167  715
41  2021   6   34  715
42  2021   7   17  715
43  2021   8   99  715
44  2021   9   20  715
45  2021  10   26  715
46  2021  11   39  715
47  2021  12   30  715

CodePudding user response:

You can use pandas rolling window function https://pandas.pydata.org/docs/user_guide/window.html

df['D'] = df['C'].rolling(window=12).sum()

This will calculate the sum of the current month and 11 rows back. But it will fill with NaN values in the beginning, until there are enough months to look back.

So we can shift up 11 rows to get the wanted result.

df['D'] = df['D'].shift(-11)

And if you want don't want any NaNs at the end, you can interpolate or pad it out.

df['D'] = df['D'].interpolate()
  • Related