I want to get de location that is inside obras desc using the list of citys that I have in a list.
I have the following dataframe
obras = pd.DataFrame([['1','Agua de Buenos Aires'],['2', 'Sistenas de carreteras Jujuy'],['3','Reasentamiento en Entre Ríos'], ['4','Rutas en Córdoba']],
columns = ['id', 'desc'])
And the list
list = ['Buenos Aires', 'Jujuy', Corrientes', 'Entre Ríos']
I try to do this
for s in obras["desc"]:if any(xs in s for xs in list):obras['Localidad'] = s
The expected result would be:
id | desc | localidad |
---|---|---|
1 | Agua de Buenos Aires | Buenos Aires |
2 | Sistenas de carreteras | Jujuy |
3 | Reasentamiento en Entre Ríos | Entre Ríos |
4 | Rutas en Córdoba | NaN |
But the result I get is:
id | desc | localidad |
---|---|---|
1 | Agua de Buenos Aires | Reasentamiento en Entre Ríos |
2 | Sistenas de carreteras | Reasentamiento en Entre Ríos |
3 | Reasentamiento en Entre Ríos | Reasentamiento en Entre Ríos |
4 | Rutas en Córdoba | Reasentamiento en Entre Ríos |
How I can solve this problem?
thanks!!!
CodePudding user response:
You can check whether a list item exists in the string using apply
:
import pandas as pd
obras = pd.DataFrame([['1','Agua de Buenos Aires'],['2', 'Sistenas de carreteras Jujuy'],['3','Reasentamiento en Entre Ríos'], ['4','Rutas en Córdoba']],columns = ['id', 'desc'])
list_ = ['Buenos Aires', 'Jujuy', 'Corrientes', 'Entre Ríos']
obras['localidad'] = obras['desc'].apply(lambda x: next(iter([i for i in list_ if i in x]), None))
Note that -given the desired output- this only returns the first match in case of multiple matches.
id | desc | localidad | |
---|---|---|---|
0 | 1 | Agua de Buenos Aires | Buenos Aires |
1 | 2 | Sistenas de carreteras Jujuy | Jujuy |
2 | 3 | Reasentamiento en Entre Ríos | Entre Ríos |
3 | 4 | Rutas en Córdoba |
PS. don't use list
as a variable name as it is also a builtin python function.