I have a dataset of more than 10,000 rows and 6 columns (one of the column is "Name"). I want to extract all rows with specific Name.
For example if I want to extract rows with two name I used this code:
import pandas as pd
df = pd.read_csv('Sample.csv')
df = df[df.Name.str.contains("name_1|name_3")]
df.to_csv("Name_list.csv")
But the problem is that I have hundreds of names for which I want to extract all the data and if I use the above code I have to write (copy/paste) all the names which is time consuming.
Is there a better way to achieve my objective?
Thank you in Advance!
CodePudding user response:
If you want to continue using a regex contains()
approach, then you may form an alternation from some input Python list, e.g.
names = ['name_1', 'name_3'] # add more names here if desired
regex = r'(?:' '|'.join(names) r')'
df = df[df.Name.str.contains(regex)]
CodePudding user response:
You can load the name csv
namelist = pd.read_csv('name.csv')
df = pd.read_csv('Sample.csv')
df = df[df.Name.str.contains('|'.join(namelist['name'].tolist()))]