Home > Back-end >  Average values on duplicate records
Average values on duplicate records

Time:07-27

I have the following input table

Custid transactions price revenue
12456 2300 21 48300
12456 5200 29 150800
78901 3000 30 90000
676512 5200 24 124800
676512 7400 18 133200

where the custid has duplicates . There is a need to remove the duplicates and find the average of the transactions field and the price field for the custid that has more than 1 record and recalculate the revenue as ((new avg)transactions *(newavg) price)

Output table

Custid transactions price revenue
12456 3750 25 93750
78901 3000 30 90000
676512 6300 21 132300

Python code tried

import pandas as pd

df1 .pd.read_csv("file1.csv")

df2=df1[custid].value_counts()

df1[transactions]=np.where(df2[count]>1,df1[transactions].mean())
df1[price]=np.where(df2[count]>1,df1[price].mean())
df1[revenue]=df1[transactions]*df1[price]

CodePudding user response:

groupby on custid, and mean for the transaction and price then calculate the revenue

df2=df.groupby('Custid').mean().astype(int).reset_index()
df2['revenue'] = df2['transactions'] * df2['price']
df2
    Custid  transactions    price   revenue
0   12456           3750    25        93750
1   78901           3000    30        90000
2   676512          6300    21       132300

  • Related