I have the following input table
Custid | transactions | price | revenue |
---|---|---|---|
12456 | 2300 | 21 | 48300 |
12456 | 5200 | 29 | 150800 |
78901 | 3000 | 30 | 90000 |
676512 | 5200 | 24 | 124800 |
676512 | 7400 | 18 | 133200 |
where the custid has duplicates . There is a need to remove the duplicates and find the average of the transactions field and the price field for the custid that has more than 1 record and recalculate the revenue as ((new avg)transactions *(newavg) price)
Output table
Custid | transactions | price | revenue |
---|---|---|---|
12456 | 3750 | 25 | 93750 |
78901 | 3000 | 30 | 90000 |
676512 | 6300 | 21 | 132300 |
Python code tried
import pandas as pd
df1 .pd.read_csv("file1.csv")
df2=df1[custid].value_counts()
df1[transactions]=np.where(df2[count]>1,df1[transactions].mean())
df1[price]=np.where(df2[count]>1,df1[price].mean())
df1[revenue]=df1[transactions]*df1[price]
CodePudding user response:
groupby on custid, and mean for the transaction and price then calculate the revenue
df2=df.groupby('Custid').mean().astype(int).reset_index()
df2['revenue'] = df2['transactions'] * df2['price']
df2
Custid transactions price revenue
0 12456 3750 25 93750
1 78901 3000 30 90000
2 676512 6300 21 132300