Home > Software design >  df.col.str.contains('somestr') doesn't work
df.col.str.contains('somestr') doesn't work

Time:07-01

Faced very weird thing - I need to drop rows containing '|' symbol in my df, but when I use .loc method the df stays the same, even though if I filter it by other character, f.e. 'a', it works.

    df = pd.DataFrame({'A':['aaa', 'bbb | aaa', 'ccc'], 'B':['abababa', 'a | b', 'abab | abab']})
    colA = 'A'
    display(df.loc[df[colA].str.contains('|')])
    display(df.loc[df['A'].str.contains('|')])
    display(df.loc[df['A'].str.contains('a')]) 

enter image description here

Does anyone know how can I bypass it?

CodePudding user response:

Try to use the regex


display(df.loc[df['A'].str.contains(r'\|', regex=True)])
display(df.loc[df['A'].str.contains(r'a', regex=True)]) 

            A       B
1   bbb | aaa   a | b

            A       B
0   aaa       abababa
1   bbb | aaa   a | b

CodePudding user response:

Looking at the docs, the regex argument defaults to True, so it is interpreting the pipe character as a disjunction operator, not a literal pipe.

Simply set regex=False to fix this:

>>> print(df.loc[df['A'].str.contains('|', regex=False)])
           A      B
1  bbb | aaa  a | b
  • Related