Home > Software engineering >  Pandas filter a list of items present in a column
Pandas filter a list of items present in a column

Time:01-17

I have a dataframe like as below

df = pd.DataFrame({'text': ["Hi how","I am fine","Ila say Hi"],
                   'tokens':[['Hi','how'],['I','am','fine'],['Ila','say','Hi']],
                    'labels':[['A','B'],['C','B','A'],['D','B','A']]})

I would like to do the below

a) Filter the df using tokens AND labels column

b) Filter based on the values Hi, Ila for tokens column

c) Filter based on the values A and D for labels column

So, I tried the below

df[((df['tokens']==['Hi'])&(df['tokens']==['Ila']))&((df['labels']==['A'])&(df['labels']==['D']))]

However, this doesn't work. Since my column has values in list format, how do I filter them whether the list has only one item or multiple items?

I expect my output to be like as below

text          tokens             labels
Ila say Hi   [Ila, say, Hi]      [D, B, A]

CodePudding user response:

You could try the following:

df.loc[
    df['tokens'].apply(lambda x: 'Hi' in x) &
    df['tokens'].apply(lambda x: 'Ila' in x) &
    df['labels'].apply(lambda x: 'A' in x) &
    df['labels'].apply(lambda x: 'D' in x) 
]

Output

         text          tokens     labels
2  Ila say Hi  [Ila, say, Hi]  [D, B, A]

You could also cast to string and use:

df.loc[
    df['tokens'].astype(str).str.contains('Hi') &
    df['tokens'].astype(str).str.contains('Ila') &
    df['labels'].astype(str).str.contains('A') &
    df['labels'].astype(str).str.contains('D') 
]

CodePudding user response:

Could you simply use a filter(lambda X: , df.columns) to get the columns you want, then just reindex the df?

  • Related