I am coding a Python function that takes an input and displays all the matching values from the dataframe but I don't get any results from an input containing a single quote (apostrophe) '
The dataframe contains values like: Mo'Nique, Thaddeus O'Sullivan, Nancy O'Dell which I can't match by typing the corresponding name.
I have tried to escape the single quote with .replace("'", "\'")
but didn't work.
Thanks for your help.
NOTE: I am parsing the values twice. First time I search for a match, if none found, I normalize the value and search again before printing name not found.
import pandas as pd
def get_name():
request_name = input("Type a name: ")
request_name = request_name.lower().title().strip()
search = False
for value in df['NameColumn']:
if request_film in value:
search = True
if not search:
df['NameColumn'] = (
df['NameColumn'].str.normalize('NFKD').str.encode(
'ascii', errors='ignore').str.decode('utf-8'))
for value in df['NameColumn']:
if request_name in value:
search = True
if search:
name_data = df.loc[(df['NameColumn'].str.contains(request_name))]
print(name_data)
else:
print("name not found")
CodePudding user response:
the single quotes in your df are possibly non-ASCII characters. If it's the case then you can use the Unidecode
package to convert Unicode characters to their ASCII equivalent. You can try this :
from unidecode import unidecode
request_name = 'O\'Dell'
for value in df['NameColumn']:
value = unidecode(value)
if request_name in value :
print(True)
else :
print(False)
Then you get :
False
True
False
CodePudding user response:
Actually, I've found the problem which was in the .title() method. The .title() method treats the word containing the apostrophe as separate words. So:
request_name = input('Type a name: ') #america's
request_name = request_name.lower().title().strip()
print(request_name) #America'S
America'S
of course doesn't match America's
which is in my dataframe.
SOLUTION 1: I apply .lower() method to both, copy of my dataframe and my input. In this way I can exactly compare the two.
SOLUTION 2: I've found it here, they suggest a regex to bypass the problem. https://www.pythontutorial.net/python-string-methods/python-titlecase/#:~:text=The title() method converts,the remaining characters in lowercase.