I have a table that looks like
A | B | C |
---|---|---|
2017 | 9 | 65 |
2017 | 10 | 72 |
2017 | 11 | 88 |
2017 | 12 | 97 |
2018 | 1 | 85 |
2018 | 2 | 67 |
2018 | 3 | 76 |
2018 | 4 | 51 |
2018 | 5 | 69 |
2018 | 6 | 97 |
2018 | 7 | 101 |
2018 | 8 | 22 |
2019 | 1 | 56 |
2019 | 2 | 34 |
2019 | 3 | 71 |
2019 | 4 | 122 |
2019 | 5 | 167 |
2019 | 6 | 34 |
2019 | 7 | 17 |
2019 | 8 | 99 |
2019 | 9 | 20 |
2019 | 10 | 26 |
2019 | 11 | 39 |
2019 | 12 | 30 |
2020 | 1 | 56 |
2020 | 2 | 34 |
2020 | 3 | 71 |
2020 | 4 | 122 |
2020 | 5 | 167 |
2020 | 6 | 34 |
2020 | 7 | 17 |
2020 | 8 | 99 |
2020 | 9 | 20 |
2020 | 10 | 26 |
2020 | 11 | 39 |
2020 | 12 | 30 |
2021 | 1 | 56 |
2021 | 2 | 34 |
2021 | 3 | 71 |
2021 | 4 | 122 |
2021 | 5 | 167 |
2021 | 6 | 34 |
2021 | 7 | 17 |
2021 | 8 | 99 |
2021 | 9 | 20 |
2021 | 10 | 26 |
2021 | 11 | 39 |
2021 | 12 | 30 |
Now what I want is :
A | B | C | D |
---|---|---|---|
2017 | 9 | 65 | 890 |
2017 | 10 | 72 | 890 |
2017 | 11 | 88 | 890 |
2017 | 12 | 97 | 890 |
2018 | 1 | 85 | 890 |
2018 | 2 | 67 | 890 |
2018 | 3 | 76 | 890 |
2018 | 4 | 51 | 890 |
2018 | 5 | 69 | 890 |
2018 | 6 | 97 | 890 |
2018 | 7 | 101 | 890 |
2018 | 8 | 22 | 890 |
2019 | 1 | 56 | 715 |
2019 | 2 | 34 | 715 |
2019 | 3 | 71 | 715 |
2019 | 4 | 122 | 715 |
2019 | 5 | 167 | 715 |
2019 | 6 | 34 | 715 |
2019 | 7 | 17 | 715 |
2019 | 8 | 99 | 715 |
2019 | 9 | 20 | 715 |
2019 | 10 | 26 | 715 |
2019 | 11 | 39 | 715 |
2019 | 12 | 30 | 715 |
2020 | 1 | 56 | 715 |
2020 | 2 | 34 | 715 |
2020 | 3 | 71 | 715 |
2020 | 4 | 122 | 715 |
2020 | 5 | 167 | 715 |
2020 | 6 | 34 | 715 |
2020 | 7 | 17 | 715 |
2020 | 8 | 99 | 715 |
2020 | 9 | 20 | 715 |
2020 | 10 | 26 | 715 |
2020 | 11 | 39 | 715 |
2020 | 12 | 30 | 715 |
2021 | 1 | 56 | 715 |
2021 | 2 | 34 | 715 |
2021 | 3 | 71 | 715 |
2021 | 4 | 122 | 715 |
2021 | 5 | 167 | 715 |
2021 | 6 | 34 | 715 |
2021 | 7 | 17 | 715 |
2021 | 8 | 99 | 715 |
2021 | 9 | 20 | 715 |
2021 | 10 | 26 | 715 |
2021 | 11 | 39 | 715 |
2021 | 12 | 30 | 715 |
Here 890 is the sum of all the values from 9,2017 through 8,2018 and 715 is the sum of all values from 1,2019 through 12,2019 and similarly 715 is the sum of all values from 1,2020 through 12,2020 and similarly 715 is the sum of all values from 1,2021 through 12,2021. For ease of calculation the numbers in column C have been taken the same i.e, (56,34,71,122,167,34,17,99,20,26,39,30) for each of 2019, 2020 and 2021. These numbers may vary for each of the years and subsequently their sums. That is we could have values like (67,87,99,100,76,11,23,44,56,78,87,5) for 2020 and (12,13,14,15,16,17,18,19,20,21,22,23) for 2021 for the months (1,2,3,4,5,6,7,8,,10,11,12) subsequently.
Now my efforts:
count_months_in_each_year = data.groupby('CALENDAR_YEAR').agg({'CALMONTH':'count'})
count_months_in_each_year.reset_index(inplace = True)
count_months_in_each_year.rename({'CALMONTH':'Count_of_Months'}, axis =1, inplace = True)
data = pd.merge(data, count_months_in_each_year, on = 'CALENDAR_YEAR', how = 'left', indicator = True )
data.drop(columns = ['_merge'], axis =1 , inplace = True)
The dataframe after this looks like this:
Now how to get the sum of the values especially in case when I have to consider 9,2017 through 8,2018 although I have the count. Now based on this what logic can be driven to generalize the code in order to get the result.
I also tried this :
####Compute total number of records - number of records which have count of months < 12
number_ofless_than_12_records = data.shape[0] - data[data['Count_of_Months']==12].shape[0]
#number_ofless_than_12_records = 144.
#Total records = 576
Can we make use of this somehow?
CodePudding user response:
I think what you are looking for is making groups of 12 rows and transform with the group sum:
df['D'] = df.groupby(df.index // 12)['C'].transform('sum')
A B C D
0 2017 9 65 890
1 2017 10 72 890
2 2017 11 88 890
3 2017 12 97 890
4 2018 1 85 890
5 2018 2 67 890
6 2018 3 76 890
7 2018 4 51 890
8 2018 5 69 890
9 2018 6 97 890
10 2018 7 101 890
11 2018 8 22 890
12 2019 1 56 715
13 2019 2 34 715
14 2019 3 71 715
15 2019 4 122 715
16 2019 5 167 715
17 2019 6 34 715
18 2019 7 17 715
19 2019 8 99 715
20 2019 9 20 715
21 2019 10 26 715
22 2019 11 39 715
23 2019 12 30 715
24 2020 1 56 715
25 2020 2 34 715
26 2020 3 71 715
27 2020 4 122 715
28 2020 5 167 715
29 2020 6 34 715
30 2020 7 17 715
31 2020 8 99 715
32 2020 9 20 715
33 2020 10 26 715
34 2020 11 39 715
35 2020 12 30 715
36 2021 1 56 715
37 2021 2 34 715
38 2021 3 71 715
39 2021 4 122 715
40 2021 5 167 715
41 2021 6 34 715
42 2021 7 17 715
43 2021 8 99 715
44 2021 9 20 715
45 2021 10 26 715
46 2021 11 39 715
47 2021 12 30 715
CodePudding user response:
You can use pandas rolling window function https://pandas.pydata.org/docs/user_guide/window.html
df['D'] = df['C'].rolling(window=12).sum()
This will calculate the sum of the current month and 11 rows back. But it will fill with NaN values in the beginning, until there are enough months to look back.
So we can shift up 11 rows to get the wanted result.
df['D'] = df['D'].shift(-11)
And if you want don't want any NaNs at the end, you can interpolate or pad it out.
df['D'] = df['D'].interpolate()