Home > Blockchain >  Iterate over index value pairs of pandas dataframe after groupby and count
Iterate over index value pairs of pandas dataframe after groupby and count

Time:06-07

I have a dataframe like the following:

df = pd.DataFrame({'Category': [1, 1, 2, 3, 2, 1], 'Value': [20, 10, 5, 15, 20, 5]})

     Category     Value
0       1          20
1       1          10
2       2           5
3       3          15
4       2          20
5       1           5

I want to count how many items of each category there are, and emit a metric for each of them by calling a function. I am getting the count of elements per category as following:

df_grouped_by_category = df.groupby("Category").count()

1   3
2   2
3   1

But I am having issues getting a function applied over each of these results so that I can publish the metrics. I have been trying the following:

df_grouped_by_category.apply(lambda x: self.emit_metric(x.category, x.count))
df_grouped_by_category.apply(lambda x: self.emit_metric(x["Category"], x["count"]))

def emit_metric(category, count) -> None:
  # Some code to emit metrics

But none of those methods recognize the column names. What am I doing wrong?

Thanks a lot for the help

CodePudding user response:

Category becomes your index, and Value remains your column name.

You should also use apply with axis=1 since you're doing an operation on every row (instead of column).

df.apply(lambda s: self.emit_metric(s.index, s.Value), axis=1)

If you want to, use as_index=False in your groupby operation to avoid Category becoming your index.

CodePudding user response:

First you cannot get 'count' as column unless you use the following:

g = df.groupby("Category",as_index=False).agg(count=('Value', 'count'))

And then use axis=1 to access the column names like:

g.apply(lambda x: emit_metric(x.Category, x.count), axis=1)
  • Related