I have a requirement where I need to select rows from a dataframe where one column-value is like values in a list. The requirement is for a large dataframe with millions of rows and need to search for rows where column-value is like values of a list of thousands of values.
Below is a sample data.
NAME,AGE
Amar,80
Rameshwar,60
Farzand,90
Naren,60
Sheikh,45
Ramesh,55
Narendra,85
Rakesh,86
Ram,85
Kajol,80
Naresh,86
Badri,85
Ramendra,80
My code is like below. But problem is that I'm using a for loop, hence with increased number of values in the list-of-values (variable names_like in my code) I need to search, the number of loop and concat operation increases and it makes the code runs very slow. I can't use the isin() option as isin is for exact match and for me it is not an exact match, it a like condition for me. Looking for a better more performance efficient way of getting the required result.
My Code:-
import pandas as pd
infile = "input.csv"
df = pd.read_csv(infile)
print(f"df=\n{df}")
names_like = ['Ram', 'Nar']
df_res = pd.DataFrame(columns=df.columns)
for name in names_like:
df1 = df[df['NAME'].str.contains(name, na=False)]
df_res = pd.concat([df_res,df1], axis=0)
print(f"df_res=\n{df_res}")
My Output:-
df_res=
NAME AGE
1 Rameshwar 60
5 Ramesh 55
8 Ram 85
12 Ramendra 80
3 Naren 60
6 Narendra 85
10 Naresh 86
Looking for a better more performance efficient way of getting the required result.
CodePudding user response:
You can pass all names in joined list by |
for regex or
, loop is not necessary:
df_res = df[df['NAME'].str.contains('|'.join(names_like), na=False)]
CodePudding user response:
Use this hope you will find a great way. df_res = df[df['NAME'].str.contains('|'.join(names_like), na=False)]