I have a pandas dataframe in which I have the column "Bio Location", I would like to filter it so that I only have the locations of my list in which there are names of cities. I have made the following code which works except that I have a problem.
For example, if the location is "Paris France" and I have Paris in my list then it will return the result. However, if I had "France Paris", it would not return "Paris". Do you have a solution? Maybe use regex? Thank u a lot!!!
df = pd.read_csv(path_to_file, encoding='utf-8', sep=',')
cities = [Paris, Bruxelles, Madrid]
values = df[df['Bio Location'].isin(citiesfr)]
values.to_csv(r'results.csv', index = False)
CodePudding user response:
Well, hello there o/
What you want here is .str.contains()
:
1. The DF I used to test:
df = {
'col1':['Paris France','France Paris Test','France Paris','Madrid Spain','Spain Madrid Test','Spain Madrid'] #so tested with 1x at start, 1x in the middle and 1x at the end of a str
}
df = pd.DataFrame(df)
df
Result:
index | col1 |
---|---|
0 | Paris France |
1 | France Paris Test |
2 | France Paris |
3 | Madrid Spain |
4 | Spain Madrid Test |
5 | Spain Madrid |
2. Then applying the code below:
Updated following comment #so tested with 1x at start, 1x in the middle and 1x at the end of a str
reg = ('Paris|Madrid')
df = df[df.col1.str.contains(reg)]
df
Result:
index | col1 |
---|---|
0 | Paris France |
1 | France Paris Test |
2 | France Paris |
3 | Madrid Spain |
4 | Spain Madrid Test |
5 | Spain Madrid |
Please accept ✅ this answer if it solved your problem, it motivates me :)
Otherwise mention me (using @) in comment while telling me what's wrong ;)