I have a dataframe like as below
df = pd.DataFrame({'text': ["Hi how","I am fine","Ila say Hi"],
'tokens':[['Hi','how'],['I','am','fine'],['Ila','say','Hi']],
'labels':[['A','B'],['C','B','A'],['D','B','A']]})
I would like to do the below
a) Filter the df using tokens
AND labels
column
b) Filter based on the values Hi
, Ila
for tokens column
c) Filter based on the values A
and D
for labels column
So, I tried the below
df[((df['tokens']==['Hi'])&(df['tokens']==['Ila']))&((df['labels']==['A'])&(df['labels']==['D']))]
However, this doesn't work. Since my column has values in list format
, how do I filter them whether the list has only one item or multiple items?
I expect my output to be like as below
text tokens labels
Ila say Hi [Ila, say, Hi] [D, B, A]
CodePudding user response:
You could try the following:
df.loc[
df['tokens'].apply(lambda x: 'Hi' in x) &
df['tokens'].apply(lambda x: 'Ila' in x) &
df['labels'].apply(lambda x: 'A' in x) &
df['labels'].apply(lambda x: 'D' in x)
]
Output
text tokens labels
2 Ila say Hi [Ila, say, Hi] [D, B, A]
You could also cast to string and use:
df.loc[
df['tokens'].astype(str).str.contains('Hi') &
df['tokens'].astype(str).str.contains('Ila') &
df['labels'].astype(str).str.contains('A') &
df['labels'].astype(str).str.contains('D')
]
CodePudding user response:
Could you simply use a filter(lambda X: , df.columns) to get the columns you want, then just reindex the df?