Home > Blockchain >  Looking for a fast way to filter a Panda Dataframe, based on 2 columns
Looking for a fast way to filter a Panda Dataframe, based on 2 columns

Time:03-03

I'm trying to improve the code below.

I've tried to use lambda(filter)) but the time was almost the same.

Wasn't able to create a vectorization for that, if it is even possible.

df = pd.DataFrame({'A':[1,1,2,3,3,2,1],
                  'B':['foo','baa','foo','baa','foo','foo','foo']})

remove_list = [a for a in set(df['A'].values) if len(df[df['A']==a]['B'].unique())<2]

df[~df['A'].isin(remove_list)]

CodePudding user response:

IIUC, try with groupby and nunique:

>>> df[df.groupby("A")["B"].transform('nunique').ge(2)]
   A    B
0  1  foo
1  1  baa
3  3  baa
4  3  foo
6  1  foo
  • Related