Home > Software design >  Pandas str.contains produces unexpected results
Pandas str.contains produces unexpected results

Time:12-29

I am trying to search a column in a pandas dataframe (python 3.8.8) to find the rows that contain different strings. Here is an example of the df column I'm searching.

print(df['fileName'])
0         data/0001_X 0Y-1-0.txt
1         data/0001_X 0Y-1-0.txt
2         data/0001_X 0Y-1-0.txt
3         data/0001_X 0Y-1-0.txt
4         data/0001_X 0Y-1-0.txt
                            ...                   
171721    data/2293_X-1Y-1-0.txt
171722    data/2293_X-1Y-1-0.txt
171723    data/2293_X-1Y-1-0.txt
171724    data/2293_X-1Y-1-0.txt
171725    data/2293_X-1Y-1-0.txt

Does anyone know why I am only able to return results for 1 out of 9 different strings I want to search for? I am certain that there aren't typos in my search strings. I've copy/pasted into my script and interactive python shell to be sure.

Returns df with correct number of rows: contain_values = df[df['fileName'].str.contains("X-1Y-1-0")]

Returns empty df: contain_values2 = df[df['fileName'].str.contains("X 0Y-1-0")]

CodePudding user response:

You have to disable regex on str.contains because means one or more characters:

>>> df[df['fileName'].str.contains("X 0Y-1-0", regex=False)]

                 fileName
0  data/0001_X 0Y-1-0.txt
1  data/0001_X 0Y-1-0.txt
2  data/0001_X 0Y-1-0.txt
3  data/0001_X 0Y-1-0.txt
4  data/0001_X 0Y-1-0.txt

Or suggested by @YusufErtas, escape the sign with \ :

>>> df[df['fileName'].str.contains("X\\ 0Y-1-0")]

                 fileName
0  data/0001_X 0Y-1-0.txt
1  data/0001_X 0Y-1-0.txt
2  data/0001_X 0Y-1-0.txt
3  data/0001_X 0Y-1-0.txt
4  data/0001_X 0Y-1-0.txt
  • Related