I have a dataframe like the following:
df = pd.DataFrame({'Category': [1, 1, 2, 3, 2, 1], 'Value': [20, 10, 5, 15, 20, 5]})
Category Value
0 1 20
1 1 10
2 2 5
3 3 15
4 2 20
5 1 5
I want to count how many items of each category there are, and emit a metric for each of them by calling a function. I am getting the count of elements per category as following:
df_grouped_by_category = df.groupby("Category").count()
1 3
2 2
3 1
But I am having issues getting a function applied over each of these results so that I can publish the metrics. I have been trying the following:
df_grouped_by_category.apply(lambda x: self.emit_metric(x.category, x.count))
df_grouped_by_category.apply(lambda x: self.emit_metric(x["Category"], x["count"]))
def emit_metric(category, count) -> None:
# Some code to emit metrics
But none of those methods recognize the column names. What am I doing wrong?
Thanks a lot for the help
CodePudding user response:
Category
becomes your index, and Value
remains your column name.
You should also use apply
with axis=1
since you're doing an operation on every row (instead of column).
df.apply(lambda s: self.emit_metric(s.index, s.Value), axis=1)
If you want to, use as_index=False
in your groupby
operation to avoid Category
becoming your index.
CodePudding user response:
First you cannot get 'count' as column unless you use the following:
g = df.groupby("Category",as_index=False).agg(count=('Value', 'count'))
And then use axis=1
to access the column names like:
g.apply(lambda x: emit_metric(x.Category, x.count), axis=1)