I am trying to remove rows based on multiple list and a related column. The condition is that when the location
string doesn't fall into the list of country
, remove the rows. My current approach is to use np.select
and I wonder if there's a way to make it more succinct or easier to re-use.
Europe = ["London", "Paris", "Berlin"]
Amercia = ["Boston", "New York", "Florida"]
Asia = ["Hong Kong", "Singapore", "Tokyo"]
data = {
"country1": ["Europe", "Asia", "Asia", "Europe", "Europe", "Amercia", "Amercia", "Asia", "Asia", "Europe"],
'location':["London", "London", "Hong Kong", "Hong Kong", "New York", "New York", "Singapore", "Singapore", "London", "London"]}
df = pd.DataFrame(data)
country location
0 Europe London
1 Asia London
2 Asia Hong Kong
3 Europe Hong Kong
4 Europe New York
5 Amercia New York
6 Amercia Singapore
7 Asia Singapore
8 Asia London
9 Europe London
Current approach
conditions = [
(df['location'].str.contains('|'.join(Europe))),
(df['location'].str.contains('|'.join(Amercia))),
(df['location'].str.contains('|'.join(Asia)))]
choices = ["Europe", "Amercia", "Asia"]
df['country2'] = np.select(conditions, choices, default=0)
df2 = df.query("country1 == country2").reset_index(drop = True)
Expected output:
country location
0 Europe London
1 Asia Hong Kong
2 Amercia New York
3 Asia Singapore
4 Europe London
CodePudding user response:
If possible mapping location
by values from list convertd to flatten dictionaries and comapred with original column use:
d = {"Europe":Europe, "Amercia":Amercia, "Asia":Asia}
d1 = {x:k for k, v in d.items() for x in v}
print (d1)
{'London': 'Europe', 'Paris': 'Europe', 'Berlin': 'Europe',
'Boston': 'Amercia', 'New York': 'Amercia', 'Florida': 'Amercia',
'Hong Kong': 'Asia', 'Singapore': 'Asia', 'Tokyo': 'Asia'}
df1 = df[df['location'].map(d1).eq(df['country1'])].reset_index(drop = True)
print (df1)
country1 location
0 Europe London
1 Asia Hong Kong
2 Amercia New York
3 Asia Singapore
4 Europe London
If possible substring in original column is possible test first matched value by dictionary:
df1 = df[df['location'].str.extract(f'({"|".join(d1)})', expand=False).map(d1).eq(df['country1'])]