I have an input df like:
category | name | cost | cost2 |
---|---|---|---|
Tier1 | Tina | 10 | 5 |
Tier1 | Jessy | 20 | 15 |
Tier1 | Tina | 30 | 10 |
Need output:
category | count distinct name | sum of cost | avg of cost2 |
---|---|---|---|
Tier1 | 2 | 60 | 10 |
It is similar to SQL: select category, count(distinct name), sum(cost), avg(cost2) group by category. I am looking for the equivalent easy way to do it in python, preferably in one statement, if not two. My actual use case has 30 columns so need a way I can simply parse the code.
CodePudding user response:
Try:
x = df.groupby("category", as_index=False).agg(
count_distinct_name=("name", "nunique"),
sum_of_cost=("cost", "sum"),
avg_of_cost2=("cost2", "mean"),
)
x.columns = x.columns.str.replace("_", " ")
print(x)
Prints:
category count distinct name sum of cost avg of cost2
0 Tier1 2 60 10.0