Home > Blockchain >  pandas dataframe: how to select rows where one column-value is like 'values in a list'
pandas dataframe: how to select rows where one column-value is like 'values in a list'

Time:11-16

I have a requirement where I need to select rows from a dataframe where one column-value is like values in a list. The requirement is for a large dataframe with millions of rows and need to search for rows where column-value is like values of a list of thousands of values.

Below is a sample data.

NAME,AGE
Amar,80
Rameshwar,60
Farzand,90
Naren,60
Sheikh,45
Ramesh,55
Narendra,85
Rakesh,86
Ram,85
Kajol,80
Naresh,86
Badri,85
Ramendra,80

My code is like below. But problem is that I'm using a for loop, hence with increased number of values in the list-of-values (variable names_like in my code) I need to search, the number of loop and concat operation increases and it makes the code runs very slow. I can't use the isin() option as isin is for exact match and for me it is not an exact match, it a like condition for me. Looking for a better more performance efficient way of getting the required result.

My Code:-

import pandas as pd

infile = "input.csv"

df = pd.read_csv(infile)
print(f"df=\n{df}")
names_like = ['Ram', 'Nar']
df_res = pd.DataFrame(columns=df.columns)
for name in names_like:
    df1 = df[df['NAME'].str.contains(name, na=False)]
    df_res = pd.concat([df_res,df1], axis=0)

print(f"df_res=\n{df_res}")

My Output:-

df_res=
         NAME AGE
1   Rameshwar  60
5      Ramesh  55
8         Ram  85
12   Ramendra  80
3       Naren  60
6    Narendra  85
10     Naresh  86

Looking for a better more performance efficient way of getting the required result.

CodePudding user response:

You can pass all names in joined list by | for regex or, loop is not necessary:

df_res = df[df['NAME'].str.contains('|'.join(names_like), na=False)]

CodePudding user response:

Use this hope you will find a great way. df_res = df[df['NAME'].str.contains('|'.join(names_like), na=False)]

  • Related