I'm trying to improve the code below.
I've tried to use lambda(filter)) but the time was almost the same.
Wasn't able to create a vectorization for that, if it is even possible.
df = pd.DataFrame({'A':[1,1,2,3,3,2,1],
'B':['foo','baa','foo','baa','foo','foo','foo']})
remove_list = [a for a in set(df['A'].values) if len(df[df['A']==a]['B'].unique())<2]
df[~df['A'].isin(remove_list)]
CodePudding user response:
IIUC, try with groupby
and nunique
:
>>> df[df.groupby("A")["B"].transform('nunique').ge(2)]
A B
0 1 foo
1 1 baa
3 3 baa
4 3 foo
6 1 foo