Home > Mobile >  Create column for sum of total revenue as well monthly revenue. using group by
Create column for sum of total revenue as well monthly revenue. using group by

Time:04-04

I'm trying to create column with Monthly total as well column for total per VIN .Kindly Help to get the output dataframe..

Final_Data = pd.DataFrame(Final_Data.groupby(by='VIN'
                        as_index=False, ,'Month')['Dealers_Revenue'].sum())

current dataframe Final_Data:

VIN Revenue Category Month
v1 30 MKL 64
v1 50 GKL 64
v1 40 GKL 64
v1 30 UKL 63
v1 40 MKL 63
v2 30 MKL 63
v2 50 GKL 63
v2 40 GKL 62
v2 30 UKL 62
v2 40 MKL 61

Output I want in dataframe Final_Data:

enter image description here

CodePudding user response:

# create DataFrame
df = pd.DataFrame({
    'VIN':['v1', 'v1', 'v1', 'v1', 'v1', 'v2', 'v2', 'v2', 'v2', 'v2'],
    'Revenue':[30, 50, 40, 30, 40, 30, 50, 40, 30, 50], 
    'Category':['MKL', 'GKL','GKL', 'UKL', 'MKL', 'MKL', 'GKL', 'GKL', 'UKL', 'MKL'],
    'Month':[64, 64, 64, 63, 63, 63, 63, 62, 62, 61]
})

print(df)

  VIN  Revenue Category  Month
0  v1       30      MKL     64
1  v1       50      GKL     64
2  v1       40      GKL     64
3  v1       30      UKL     63
4  v1       40      MKL     63
5  v2       30      MKL     63
6  v2       50      GKL     63
7  v2       40      GKL     62
8  v2       30      UKL     62
9  v2       50      MKL     61
# revenue sum by VIN and Month 
df_group = df.groupby(['VIN','Month']).agg(list).reset_index()

print(df_group)

  VIN  Month       Revenue         Category
0  v1     63      [30, 40]       [UKL, MKL]
1  v1     64  [30, 50, 40]  [MKL, GKL, GKL]
2  v2     61          [50]            [MKL]
3  v2     62      [40, 30]       [GKL, UKL]
4  v2     63      [30, 50]       [MKL, GKL]

# calculate total revenue by VIN
df_tot = df_group.groupby(['VIN'])['Revenue'].sum().reset_index()

print(df_tot)

VIN               Revenue
0  v1  [30, 40, 30, 50, 40]
1  v2  [50, 40, 30, 30, 50]
# merge df_group with df_tot and rename columns
df_merge = pd.merge(df_group, df_tot, on='VIN').rename(columns={'Revenue_x': 'Total Revenue by Month', 'Revenue_y': 'Total Revenue by VIN'})

# sum lists
df_merge['Total Revenue by Month'] = df_merge['Total Revenue by Month'].apply(sum)
df_merge['Total Revenue by VIN'] = df_merge['Total Revenue by VIN'].apply(sum)

print(df_merge)

  VIN  Month  Total Revenue by Month         Category  Total Revenue by VIN
0  v1     63                      70       [UKL, MKL]                   190
1  v1     64                     120  [MKL, GKL, GKL]                   190
2  v2     61                      50            [MKL]                   200
3  v2     62                      70       [GKL, UKL]                   200
4  v2     63                      80       [MKL, GKL]                   200
  • Related