Home > Blockchain >  Filter Dataframe that contain specific characters by user (Python)
Filter Dataframe that contain specific characters by user (Python)


I'm trying to find Names that contain the letters by user input. In this case, finding Names in the Name column that contain 'a' and 'i' however getting an error:

data = {'Name': ['Aerial', 'Tom', 'Amie', 'Anuj'],
        'Age': [27, 24, 22, 32],
        'Address': ['pennsylvania', 'newyork', 'newjersey', 'delaware'],
        'Qualification': ['Msc', 'MA', 'MCA', 'Phd']}
df = pd.DataFrame(data)
df["Name"] = df["Name"].str.lower()
letters_in = input('Words in Name Column that contain these letters: \n ').split()
new_output = df.loc[df['Name'].str.contains(letters_in, case=False)]

Code run:

Words in Name Column that contain these letters: 

>? a e
TypeError: unhashable type: 'list'

Ideal Output (as dataframe):


CodePudding user response:

First, to address your error message, the contains() method expects a string as its first argument, not a list.

The string it expects is a character sequence or regular expression (see here) that it will attempt to match, which I believe is different from what you are attempting, namely to find rows with Name containing all input letters.

To do this, you can use the following approach, for example:

import pandas as pd
data = {'Name': ['Aerial', 'Tom', 'Amie', 'Anuj'],
        'Age': [27, 24, 22, 32],
        'Address': ['pennsylvania', 'newyork', 'newjersey', 'delaware'],
        'Qualification': ['Msc', 'MA', 'MCA', 'Phd']}
df = pd.DataFrame(data)
df["Name"] = df["Name"].str.lower()
#letters_in = input('Words in Name Column that contain these letters: \n ').split()
letters_in = ['a', 'i']
new_output = df[df.apply(lambda x: all(letter in x['Name'] for letter in letters_in), axis=1)]


     Name  Age       Address Qualification
0  aerial   27  pennsylvania           Msc
2    amie   22     newjersey           MCA
  • Related