Home > Net >  Python: how do I match values in the dataframe from an input containing single quotes (apostrophe)?
Python: how do I match values in the dataframe from an input containing single quotes (apostrophe)?

Time:04-17

I am coding a Python function that takes an input and displays all the matching values from the dataframe but I don't get any results from an input containing a single quote (apostrophe) '
The dataframe contains values like: Mo'Nique, Thaddeus O'Sullivan, Nancy O'Dell which I can't match by typing the corresponding name.

I have tried to escape the single quote with .replace("'", "\'") but didn't work.

Thanks for your help.

NOTE: I am parsing the values twice. First time I search for a match, if none found, I normalize the value and search again before printing name not found.

import pandas as pd

def get_name():
    request_name = input("Type a name: ")
    request_name = request_name.lower().title().strip()
    search = False
    for value in df['NameColumn']:
        if request_film in value:
            search = True

    if not search:
        df['NameColumn'] = (
            df['NameColumn'].str.normalize('NFKD').str.encode(
                'ascii', errors='ignore').str.decode('utf-8'))
        for value in df['NameColumn']:
            if request_name in value:
                search = True

    if search:
        name_data = df.loc[(df['NameColumn'].str.contains(request_name))]
        print(name_data)

    else:
        print("name not found")

CodePudding user response:

the single quotes in your df are possibly non-ASCII characters. If it's the case then you can use the Unidecode package to convert Unicode characters to their ASCII equivalent. You can try this :

from unidecode import unidecode

request_name = 'O\'Dell'

for value in df['NameColumn']:
    value = unidecode(value)
    if request_name in value :
        print(True)
    else :
        print(False)

Then you get :

False
True
False

CodePudding user response:

Actually, I've found the problem which was in the .title() method. The .title() method treats the word containing the apostrophe as separate words. So:

request_name = input('Type a name: ') #america's
request_name = request_name.lower().title().strip()
print(request_name) #America'S

America'S of course doesn't match America's which is in my dataframe.

SOLUTION 1: I apply .lower() method to both, copy of my dataframe and my input. In this way I can exactly compare the two.

SOLUTION 2: I've found it here, they suggest a regex to bypass the problem. https://www.pythontutorial.net/python-string-methods/python-titlecase/#:~:text=The title() method converts,the remaining characters in lowercase.

  • Related