I have a big dataset. It's about news reading. I'm trying to clean it. I created a checklist of cities that I want to keep (the set has all the cities). How can I drop the rows based on that checklist? For example, I have a checklist (as a list) that contains all the french cities. How can I drop other cities?
To picture the data frame (I have 1.5m rows btw):
City Age
0 Paris 25-34
1 Lyon 45-54
2 Kiev 35-44
3 Berlin 25-34
4 New York 25-34
5 Paris 65
6 Toulouse 35-44
7 Nice 55-64
8 Hannover 45-54
9 Lille 35-44
10 Edinburgh 65
11 Moscow 25-34
CodePudding user response:
You can do this using pandas.Dataframe.isin
. This will return boolean values checking whether each element is inside the list x
. You can then use the boolean values and take out the subset of the df
with rows that return True
by doing df[df['City'].isin(x)]
. Following is my solution:
import pandas as pd
x = ['Paris' , 'Marseille']
df = pd.DataFrame(data={'City':['Paris', 'London', 'New York', 'Marseille'],
'Age':[1, 2, 3, 4]})
print(df)
df = df[df['City'].isin(x)]
print(df)
Output:
>>> City Age
0 Paris 1
1 London 2
2 New York 3
3 Marseille 4
City Age
0 Paris 1
3 Marseille 4