Home > Software design >  Pandas group by two columns and count the second column value by each group
Pandas group by two columns and count the second column value by each group

Time:07-07

I have a dataset of domains could someone tell me how I can filter domains with more than one extension with Pandas.

I grouped it by this code but I got this result:

dfActive.groupby(['domain','ext'])['ext'].nunique()

Result:

domain         com     1
sample         com     1
mashhadmap     com     1
               net     1

Expected Result:

mashhadmap     2

CodePudding user response:

IIUC use if need count per first level domain by aggregate sum:

dfActive.groupby(['domain','ext'])['ext'].nunique().groupby(level=0).sum()

If need filter values if duplicated per first level:

s = dfActive.groupby(['domain','ext'])['ext'].nunique()
s = s[s.index.get_level_values(0).duplicated(keep=False)]

#and then if need aggregate sum
out = s.groupby(level=0).sum()
  • Related