Home > Blockchain >  Filter Dataframe that contain specific characters by user (Python)
Filter Dataframe that contain specific characters by user (Python)

Time:03-24

I'm trying to find Names that contain the letters by user input. In this case, finding Names in the Name column that contain 'a' and 'i' however getting an error:

data = {'Name': ['Aerial', 'Tom', 'Amie', 'Anuj'],
        'Age': [27, 24, 22, 32],
        'Address': ['pennsylvania', 'newyork', 'newjersey', 'delaware'],
        'Qualification': ['Msc', 'MA', 'MCA', 'Phd']}
df = pd.DataFrame(data)
df["Name"] = df["Name"].str.lower()
print(df)
letters_in = input('Words in Name Column that contain these letters: \n ').split()
new_output = df.loc[df['Name'].str.contains(letters_in, case=False)]

Code run:

Words in Name Column that contain these letters: 

>? a e
ERROR: 
TypeError: unhashable type: 'list'

Ideal Output (as dataframe):

Aerial
Amie

CodePudding user response:

First, to address your error message, the contains() method expects a string as its first argument, not a list.

The string it expects is a character sequence or regular expression (see here) that it will attempt to match, which I believe is different from what you are attempting, namely to find rows with Name containing all input letters.

To do this, you can use the following approach, for example:

import pandas as pd
data = {'Name': ['Aerial', 'Tom', 'Amie', 'Anuj'],
        'Age': [27, 24, 22, 32],
        'Address': ['pennsylvania', 'newyork', 'newjersey', 'delaware'],
        'Qualification': ['Msc', 'MA', 'MCA', 'Phd']}
df = pd.DataFrame(data)
df["Name"] = df["Name"].str.lower()
#letters_in = input('Words in Name Column that contain these letters: \n ').split()
letters_in = ['a', 'i']
new_output = df[df.apply(lambda x: all(letter in x['Name'] for letter in letters_in), axis=1)]
print(new_output)

Output:

     Name  Age       Address Qualification
0  aerial   27  pennsylvania           Msc
2    amie   22     newjersey           MCA
  • Related