this is a dataframe having column 'customer' with repetative values
df=pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],'customer':['a','b','c','b','b','b','d','e','e','f'],'address':['xx','yy','rr','yy','oo','ee','vv','zz','nn','cc']})
want values repeating more than 3 times
df.groupby(['customer']).count()>3
result==> in the result am getting boolean values
id address
customer
a False False
b True True
c False False
d False False
e False False
f False False
expected result==>
id customer address
1 2 b yy
CodePudding user response:
You can GroupBy.filter()
the dataframe and the .drop_duplicates
by "customer"
column:
x = (
df.groupby("customer")
.filter(lambda x: len(x) > 3)
.drop_duplicates("customer")
)
print(x)
Prints:
id customer address
1 2 b yy
CodePudding user response:
You can use groupby.transform
and boolean indexing:
df[df.groupby('customer')['customer'].transform('count').gt(3)]
Output:
id customer address
1 2 b yy
3 4 b yy
4 5 b oo
5 6 b ee
CodePudding user response:
Fix your code with isin
s = df.groupby(['customer'])['id'].count()>3
out = df.loc[df['customer'].isin(s[s].index)]
Out[389]:
id customer address
1 2 b yy
3 4 b yy
4 5 b oo
5 6 b ee