Faced very weird thing - I need to drop rows containing '|' symbol in my df, but when I use .loc method the df stays the same, even though if I filter it by other character, f.e. 'a', it works.
df = pd.DataFrame({'A':['aaa', 'bbb | aaa', 'ccc'], 'B':['abababa', 'a | b', 'abab | abab']})
colA = 'A'
display(df.loc[df[colA].str.contains('|')])
display(df.loc[df['A'].str.contains('|')])
display(df.loc[df['A'].str.contains('a')])
Does anyone know how can I bypass it?
CodePudding user response:
Try to use the regex
display(df.loc[df['A'].str.contains(r'\|', regex=True)])
display(df.loc[df['A'].str.contains(r'a', regex=True)])
A B
1 bbb | aaa a | b
A B
0 aaa abababa
1 bbb | aaa a | b
CodePudding user response:
Looking at the docs, the regex
argument defaults to True
, so it is interpreting the pipe character as a disjunction operator, not a literal pipe.
Simply set regex=False
to fix this:
>>> print(df.loc[df['A'].str.contains('|', regex=False)])
A B
1 bbb | aaa a | b