I have a dataframe like the below:
dummy_dict_existing = {'Email':['[email protected]', '[email protected]'],
'Ticket_Category': ['Tier1', 'Tier2'],
'Quantity_Purchased': [5,2],
'Total_Price_Paid':[1345.45, 10295.88]}
Email Ticket_Category Quantity_Purchased Total_Price_Paid
0 [email protected] Tier1 5 1345.45
1 [email protected] Tier2 2 10295.88
What I'm trying to do is to create 2 new columns "Tier1_Quantity_Purchased" and "Tier2_Quantity_Purchased" based on the existing dataframe, and sum the total of "Total_Price_Paid" as below:
dummy_dict_desired = {'Email':['[email protected]'],
'Tier1_Quantity_Purchased': [5],
'Tier2_Quantity_Purchased':[2],
'Total_Price_Paid':[11641.33]}
Email Tier1_Quantity_Purchased Tier2_Quantity_Purchased Total_Price_Paid
0 [email protected] 5 2 11641.33
Any help would be greatly appreciated. I know there is an easy way to do this, just can't figure out how without writing some silly for loop!
CodePudding user response:
What you want to do is to pivot your table, and then add a column with aggregated data from the original table.
df = pd.DataFrame(dummy_dict_existing)
pivot_df = df.pivot(index='Email', columns='Ticket_Category', values='Quantity_Purchased')
pivot_df['total'] = df.groupby('Email')['Total_Price_Paid'].sum()
Tier1 | Tier2 | total | |
---|---|---|---|
[email protected] | 5 | 2 | 11641.33 |