I am trying to search a column in a pandas dataframe (python 3.8.8) to find the rows that contain different strings. Here is an example of the df column I'm searching.
print(df['fileName'])
0 data/0001_X 0Y-1-0.txt
1 data/0001_X 0Y-1-0.txt
2 data/0001_X 0Y-1-0.txt
3 data/0001_X 0Y-1-0.txt
4 data/0001_X 0Y-1-0.txt
...
171721 data/2293_X-1Y-1-0.txt
171722 data/2293_X-1Y-1-0.txt
171723 data/2293_X-1Y-1-0.txt
171724 data/2293_X-1Y-1-0.txt
171725 data/2293_X-1Y-1-0.txt
Does anyone know why I am only able to return results for 1 out of 9 different strings I want to search for? I am certain that there aren't typos in my search strings. I've copy/pasted into my script and interactive python shell to be sure.
Returns df with correct number of rows:
contain_values = df[df['fileName'].str.contains("X-1Y-1-0")]
Returns empty df:
contain_values2 = df[df['fileName'].str.contains("X 0Y-1-0")]
CodePudding user response:
You have to disable regex
on str.contains
because
means one or more characters:
>>> df[df['fileName'].str.contains("X 0Y-1-0", regex=False)]
fileName
0 data/0001_X 0Y-1-0.txt
1 data/0001_X 0Y-1-0.txt
2 data/0001_X 0Y-1-0.txt
3 data/0001_X 0Y-1-0.txt
4 data/0001_X 0Y-1-0.txt
Or suggested by @YusufErtas, escape the sign
with \
:
>>> df[df['fileName'].str.contains("X\\ 0Y-1-0")]
fileName
0 data/0001_X 0Y-1-0.txt
1 data/0001_X 0Y-1-0.txt
2 data/0001_X 0Y-1-0.txt
3 data/0001_X 0Y-1-0.txt
4 data/0001_X 0Y-1-0.txt