Is there a pandas function to do something like (sum(A)/count(distinct B)), where A, and B are columns in a dataframe. A has continuous values whereas B has categorical values. The dataframe may have other columns in addition.I don't want to group by, just get the weighted average.
Example: Average orders across users would be sum(orders)/count(distinct user_id) where orders and user ids would be columns in the dataframe.
Sample dataframe:
OrderQuantity CustomerID
0 10 1214
1 9 1214
2 8 1213
3 14 1213
4 3 1213
What I need is something like sum(df['OrderQuantity'])/len(df['CustomerID'].unique())
i.e 44/2 = 22
in this case but I was wondering if there was way to do it with pandas, so the result would be a dataframe with a column named OrderQuantity
and a single row holding the value.
CodePudding user response:
Your formula has two members, so the problem is solved in two steps with pd.Series in the end.
n = df.CustomerID.nunique()
df.drop(columns='CustomerID').sum() / n
You also want a table of one row with the same column labels - just transpose the output. This one-liner solves the problem:
pd.DataFrame(df.drop(columns='CustomerID').sum() / df.CustomerID.nunique()).T
CodePudding user response:
df['OrderQuantity'].sum()/df['CustomerID'].drop_duplicates().count()