I have a dataframe like as below
df1 = pd.DataFrame({'s_id': [18,23],'SUB':['Phy','Phy'],'Rev_Q1':[80,100],'Rev_Q2':[60,50],'Rev_Q3':[20,12],'Rev_Q4':[10,12]})
I would like to do the below
a) Split the revenue from each quarter equally into 3 months (for each quarter). For ex: If you look at Rev_Q1
, the value is 80 for s_id = 18
. So, we divide that by 3 (because 3 months in a qtr). So, 80/3 = 26.667
b) Pivot the rows and store them as columns
I tried the below but this is clearly not elegant/efficient
df1['Jan'] = df1.loc[:,df1.columns.str.contains("Q1")]/3
df1['Feb'] = df1.loc[:,df1.columns.str.contains("Q1")]/3
df1['Mar'] = df1.loc[:,df1.columns.str.contains("Q1")]/3
df1['Apr'] = df1.loc[:,df1.columns.str.contains("Q2")]/3
df1['May'] = df1.loc[:,df1.columns.str.contains("Q2")]/3
df1['Jun'] = df1.loc[:,df1.columns.str.contains("Q2")]/3
df1['Jul'] = df1.loc[:,df1.columns.str.contains("Q3")]/3
df1['Aug'] = df1.loc[:,df1.columns.str.contains("Q3")]/3
df1['Sep'] = df1.loc[:,df1.columns.str.contains("Q3")]/3
df1['Oct'] = df1.loc[:,df1.columns.str.contains("Q4")]/3
df1['Nov'] = df1.loc[:,df1.columns.str.contains("Q4")]/3
df1['Dec'] = df1.loc[:,df1.columns.str.contains("Q4")]/3
pd.melt(df1, id_vars=['s_id','SUB'],value_vars=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],var_name='month_name',value_name='rev')
Is there any other better way to write this with elegant piece of code?
I expect my output to be like as below (Divide each qtr value by 3 (bcoz 3 months) and assign that value to each month in the quarter and melt the dataframe)
Am showing a sample output for Qtr 1
. Similarly, I have to do for all quarters in the below format
CodePudding user response:
You can use a mapping dictionary and explode
:
quarters = {'Q1': ['Jan','Feb','Mar'], 'Q2': ['Apr','May','Jun'],
'Q3': ['Jul','Aug','Sep'], 'Q4': ['Oct', 'Nov', 'Dec']
}
out = (df1
.melt(['s_id', 'SUB'], value_name='rev', var_name='qtr')
.assign(rev=lambda d: d['rev'].div(3),
qtr=lambda d: d['qtr'].str[-2:].map(quarters)
)
.explode('qtr')
)
output:
s_id SUB qtr rev
0 18 Phy Jan 26.666667
0 18 Phy Feb 26.666667
0 18 Phy Mar 26.666667
1 23 Phy Jan 33.333333
1 23 Phy Feb 33.333333
1 23 Phy Mar 33.333333
2 18 Phy Apr 20.000000
2 18 Phy May 20.000000
2 18 Phy Jun 20.000000
3 23 Phy Apr 16.666667
3 23 Phy May 16.666667
3 23 Phy Jun 16.666667
4 18 Phy Jul 6.666667
4 18 Phy Aug 6.666667
4 18 Phy Sep 6.666667
5 23 Phy Jul 4.000000
5 23 Phy Aug 4.000000
5 23 Phy Sep 4.000000
6 18 Phy Oct 3.333333
6 18 Phy Nov 3.333333
6 18 Phy Dec 3.333333
7 23 Phy Oct 4.000000
7 23 Phy Nov 4.000000
7 23 Phy Dec 4.000000