Home > database >  Better way to extract rows with specific string in Python
Better way to extract rows with specific string in Python

Time:11-30

I have a dataset of more than 10,000 rows and 6 columns (one of the column is "Name"). I want to extract all rows with specific Name.

For example if I want to extract rows with two name I used this code:

import pandas as pd

df = pd.read_csv('Sample.csv') 

df = df[df.Name.str.contains("name_1|name_3")]

df.to_csv("Name_list.csv")

But the problem is that I have hundreds of names for which I want to extract all the data and if I use the above code I have to write (copy/paste) all the names which is time consuming.

Is there a better way to achieve my objective?

Thank you in Advance!

CodePudding user response:

If you want to continue using a regex contains() approach, then you may form an alternation from some input Python list, e.g.

names = ['name_1', 'name_3']  # add more names here if desired
regex = r'(?:'   '|'.join(names)   r')'
df = df[df.Name.str.contains(regex)]

CodePudding user response:

You can load the name csv

namelist = pd.read_csv('name.csv') 
df = pd.read_csv('Sample.csv') 
df = df[df.Name.str.contains('|'.join(namelist['name'].tolist()))]
  • Related