Home > OS >  Pandas: Compute average of a column by count of another column, without grouping by
Pandas: Compute average of a column by count of another column, without grouping by

Time:07-08

Is there a pandas function to do something like (sum(A)/count(distinct B)), where A, and B are columns in a dataframe. A has continuous values whereas B has categorical values. The dataframe may have other columns in addition.I don't want to group by, just get the weighted average.

Example: Average orders across users would be sum(orders)/count(distinct user_id) where orders and user ids would be columns in the dataframe.

Sample dataframe:

        OrderQuantity CustomerID
0             10       1214
1              9       1214
2              8       1213
3             14       1213
4             3        1213

What I need is something like sum(df['OrderQuantity'])/len(df['CustomerID'].unique()) i.e 44/2 = 22 in this case but I was wondering if there was way to do it with pandas, so the result would be a dataframe with a column named OrderQuantity and a single row holding the value.

CodePudding user response:

Your formula has two members, so the problem is solved in two steps with pd.Series in the end.

n = df.CustomerID.nunique()
df.drop(columns='CustomerID').sum() / n

You also want a table of one row with the same column labels - just transpose the output. This one-liner solves the problem:

pd.DataFrame(df.drop(columns='CustomerID').sum() / df.CustomerID.nunique()).T

CodePudding user response:

df['OrderQuantity'].sum()/df['CustomerID'].drop_duplicates().count()

  • Related