Home > front end >  Pandas - Count row with a specific value when grouped
Pandas - Count row with a specific value when grouped

Time:12-22

I have a certain use case and I cannot do it well in pandas.

order_id  asset_id
1         A
1         B
1         C
2         A
2         C
3         A
4         B
4         C
  1. I would like to know in how many orders I have the asset A alone? In that case: 1 time (In order 3)
  2. I would like to know in how many orders I have the asset A with others? In that case: 2 times (In order 1 and 2)

It could be great to have some help with that. I can't figure how to do it.

CodePudding user response:

If need count unique values membership per groups order_id first aggregate sets and then compare values by set A:

s = df.groupby('order_id')['asset_id'].agg(set)
print (s)
order_id
1    {A, B, C}
2       {A, C}
3          {A}
4       {B, C}
Name: asset_id, dtype: object

alone = (s == {'A'}).sum()
print (alone)
1

with_others = (s > {'A'}).sum()
print (with_others)
2

CodePudding user response:

Use groupby.agg with set operations:

(df.groupby('order_id')['asset_id']
   .agg(alone=lambda x: set(x)=={'A'},
        others=lambda x: set(x)>{'A'}
       )
   .sum()
)

Output:

alone     1
others    2
dtype: int64
  • Related