This is my available df, it contains year from 2016 to 2020
Year Month Bill
-----------------
2016 1 2
2016 2 5
2016 3 10
2016 4 2
2016 5 4
2016 6 9
2016 7 7
2016 8 8
2016 9 9
2016 10 5
2016 11 1
2016 12 3
.
.
.
2020 12 10
Now I want to create a 2 new columns in this dataframe Level and Contribution. and the level column contain Q1, Q2, Q3, Q4 representing 4 quarters of the year and Contribution contains average value from bill column of each quarter in those 3 months of the respective year.
for example Q1 for 2016 will contains the average of month 1,2,3 of bill across **Contribution**
and same for Q3 for year 2020 will contains average of month 7,8,9 of the year 2020 bill column in the Contribution Column, expected Dataframe is given below
Year Month Bill levels contribution
------------------------------------
2016 1 2 Q1 5.66
2016 2 5 Q1 5.66
2016 3 10 Q1 5.66
2016 4 2 Q2 5
2016 5 4 Q2 5
2016 6 9 Q2 5
2016 7 7 Q3 8
2016 8 8 Q3 8
2016 9 9 Q3 8
2016 10 5 Q4 3
2016 11 1 Q4 3
2016 12 3 Q4 3
.
.
2020 10 2 Q4 6
2020 11 6 Q4 6
2020 12 10 Q4 6
This process will be repeated for each month 4 quarters Iam not able to figure out the as it is something new to me
CodePudding user response:
You can try:
df['levels'] = 'Q' df['Month'].div(3).apply(math.ceil).astype(str)
df['contribution'] = df.groupby(['Year', 'levels'])['Bill'].transform('mean')
CodePudding user response:
Pandas actually has a set of datatypes for monthly and quarterly values called Pandas.Period
.
See this similar question:
In your case it would look like this:
from datetime import datetime
# First create dates for 1st of each month period
df['Dates'] = [datetime(row['Year'], row['Month'], 1) for i, row in df[['Year', 'Month']].iterrows()]
# Create monthly periods
df['Month Periods'] = df['Dates'].dt.to_period('M')
# Use the new monthly index
df = df.set_index('Month Periods')
# Group by quarters
df_qtrly = df['Bill'].resample('Q').mean()
df_qtrly.index.names = ['Quarters']
print(df_qtrly)
Result:
Quarters
2016Q1 5.666667
2016Q2 5.000000
2016Q3 8.000000
2016Q4 3.000000
Freq: Q-DEC, Name: Bill, dtype: float64
If you want to put these values back into the monthly dataframe you could do this:
df['Quarters'] = df['Dates'].dt.to_period('Q')
df['Contributions'] = df_qtrly.loc[df['Quarters']].values
Year Month Bill Dates Quarters Contributions
Month Periods
2016-01 2016 1 2 2016-01-01 2016Q1 5.666667
2016-02 2016 2 5 2016-02-01 2016Q1 5.666667
2016-03 2016 3 10 2016-03-01 2016Q1 5.666667
2016-04 2016 4 2 2016-04-01 2016Q2 5.000000
2016-05 2016 5 4 2016-05-01 2016Q2 5.000000
2016-06 2016 6 9 2016-06-01 2016Q2 5.000000
2016-07 2016 7 7 2016-07-01 2016Q3 8.000000
2016-08 2016 8 8 2016-08-01 2016Q3 8.000000
2016-09 2016 9 9 2016-09-01 2016Q3 8.000000
2016-10 2016 10 5 2016-10-01 2016Q4 3.000000
2016-11 2016 11 1 2016-11-01 2016Q4 3.000000
2016-12 2016 12 3 2016-12-01 2016Q4 3.000000