I have a dataset of domains could someone tell me how I can filter domains with more than one extension with Pandas.
I grouped it by this code but I got this result:
dfActive.groupby(['domain','ext'])['ext'].nunique()
Result:
domain com 1
sample com 1
mashhadmap com 1
net 1
Expected Result:
mashhadmap 2
CodePudding user response:
IIUC use if need count per first level domain
by aggregate sum
:
dfActive.groupby(['domain','ext'])['ext'].nunique().groupby(level=0).sum()
If need filter values if duplicated per first level:
s = dfActive.groupby(['domain','ext'])['ext'].nunique()
s = s[s.index.get_level_values(0).duplicated(keep=False)]
#and then if need aggregate sum
out = s.groupby(level=0).sum()