Average values on duplicate records-CodePudding

I have the following input table

Custid	transactions	price	revenue
12456	2300	21	48300
12456	5200	29	150800
78901	3000	30	90000
676512	5200	24	124800
676512	7400	18	133200

where the custid has duplicates . There is a need to remove the duplicates and find the average of the transactions field and the price field for the custid that has more than 1 record and recalculate the revenue as ((new avg)transactions *(newavg) price)

Output table

Custid	transactions	price	revenue
12456	3750	25	93750
78901	3000	30	90000
676512	6300	21	132300

Python code tried

import pandas as pd

df1 .pd.read_csv("file1.csv")

df2=df1[custid].value_counts()

df1[transactions]=np.where(df2[count]>1,df1[transactions].mean())
df1[price]=np.where(df2[count]>1,df1[price].mean())
df1[revenue]=df1[transactions]*df1[price]

CodePudding user response：

groupby on custid, and mean for the transaction and price then calculate the revenue

df2=df.groupby('Custid').mean().astype(int).reset_index()
df2['revenue'] = df2['transactions'] * df2['price']
df2

    Custid  transactions    price   revenue
0   12456           3750    25        93750
1   78901           3000    30        90000
2   676512          6300    21       132300