I have a df that has a column of city names and a column with crashes (integer) column where the city column has a correct spelling but also has misspellings of each. I created a list of each city's correct spelling and trying to filter out the misspelled rows. But the code I've tried so far isn't filtering out the misspelled rows. My searches on how to do this lead to using isin.
Here is part of my city list:
city_list = [('Aberdeen', 'Ahoskie', 'Alamance', 'Albermarle',...
Here are some attempts:
df = df[~df['city'].isin(city_list)]
df = df[df['city'].apply(lambda x: tuple([y.lower() for y in x])).isin(city_list)]
df = df[np.isin(df['city'], city_list)]
So I want to filter out 'Aberdee' and keep 'Aberdeen' as an example:
city crash_1
915 ABERDEE 1
97 ABERDEEN 587
916 ABSHERS 1
917 ACME 1
Much obliged for any help.
CodePudding user response:
just you need to take into consideration the city name cases (upper.lower,captilize):
df = df[df.city.str.capitalize().isin(city_list)]
CodePudding user response:
The only thing that needs to deal with is the list of tuple of the cities. If you are not keen on keeping the tuple, you can remove it while capitalizing the strings.
newlist = [i.upper() for i in city_list[0]]
df[df['city'].isin(newlist)]