Home > Blockchain >  Pandas create two new columns based on 2 existing columns
Pandas create two new columns based on 2 existing columns

Time:06-16

I have a dataframe like the below:

dummy_dict_existing  = {'Email':['[email protected]', '[email protected]'],
              'Ticket_Category': ['Tier1', 'Tier2'],
              'Quantity_Purchased': [5,2],
              'Total_Price_Paid':[1345.45, 10295.88]}
               Email Ticket_Category  Quantity_Purchased  Total_Price_Paid
0  [email protected]           Tier1                   5           1345.45
1  [email protected]           Tier2                   2          10295.88

What I'm trying to do is to create 2 new columns "Tier1_Quantity_Purchased" and "Tier2_Quantity_Purchased" based on the existing dataframe, and sum the total of "Total_Price_Paid" as below:

dummy_dict_desired  = {'Email':['[email protected]'],
              'Tier1_Quantity_Purchased': [5],
               'Tier2_Quantity_Purchased':[2],
              'Total_Price_Paid':[11641.33]}

               Email  Tier1_Quantity_Purchased  Tier2_Quantity_Purchased  Total_Price_Paid
0  [email protected]                         5                         2          11641.33

Any help would be greatly appreciated. I know there is an easy way to do this, just can't figure out how without writing some silly for loop!

CodePudding user response:

What you want to do is to pivot your table, and then add a column with aggregated data from the original table.

df = pd.DataFrame(dummy_dict_existing)
pivot_df = df.pivot(index='Email', columns='Ticket_Category', values='Quantity_Purchased')
pivot_df['total'] = df.groupby('Email')['Total_Price_Paid'].sum()
Email Tier1 Tier2 total
[email protected] 5 2 11641.33

For more details on pivoting, take a look at enter image description here

  • Related