I have this data set
obras = pd.DataFrame(
[['1','Entre Ríos'],['2', 'Entre Rios'],['3','entre Ríos'], ['4','Entre ríos']],
columns = ['id', 'prov']
)
In order to normalize the names of "Prov" I call an API
API_BASE_URL = "https://apis.datos.gob.ar/georef/api/"
def get_similar_bulk(endpoint, nombres):
# realiza consulta a la API
data = {
endpoint: [
{"nombre": nombre, "max": 1} for nombre in nombres
]}
url = API_BASE_URL endpoint
results = requests.post(
url, json=data, headers={"Content-Type": "application/json"}
).json()
# convierte a una lista de "resultado más probable" o "vacío" cuando no hay
parsed_results = [
single_result[endpoint][0] if single_result[endpoint] else {}
for single_result in results["resultados"]
]
return parsed_results
obras['prov_norm'] = get_similar_bulk("provincias", obras['prov'])
As a result i get
id | prov | prov_norm |
---|---|---|
1 | Entre Ríos | {'centroide': {'lat': -32.0588735436448, 'lon': -59.2014475514635}, 'id': '30', 'nombre': 'Entre Ríos'} |
2 | Entre Rios | {'centroide': {'lat': -32.0588735436448, 'lon': -59.2014475514635}, 'id': '30', 'nombre': 'Entre Ríos'} |
3 | entre Ríos | {'centroide': {'lat': -32.0588735436448, 'lon': -59.2014475514635}, 'id': '30', 'nombre': 'Entre Ríos'} |
4 | Entre ríos | {'centroide': {'lat': -32.0588735436448, 'lon': -59.2014475514635}, 'id': '30', 'nombre': 'Entre Ríos'} |
The expected result would be:
id | prov | lat | lon | id | nombre |
---|---|---|---|---|---|
1 | Entre Ríos | -32.0588735436448 | -59.2014475514635 | 30 | Entre Ríos |
2 | Entre Rios | -32.0588735436448 | -59.2014475514635 | 30 | Entre Ríos |
3 | entre Ríos | -32.0588735436448 | -59.2014475514635 | 30 | Entre Ríos |
4 | Entre ríos | -32.0588735436448 | -59.2014475514635 | 30 | Entre Ríos |
How I can solve this problem?
thanks!!!
CodePudding user response:
if you want sperate column then you must put data in sperate column like this:
in get_similar_bulk function:
# convierte a una lista de "resultado más probable" o "vacío" cuando no hay
parsed_results = [
single_result[endpoint][0] if single_result[endpoint] else {}
for single_result in results["resultados"]
]
parsed_results = [{'lat':x['centroide']['lat'],'lon': x['centroide']['lon'],'id_obras': x['id'],'nombre':x['nombre']} for x in parsed_results ]
parsed_results=[[x[0] for x in parsed_results],[x[1] for x in parsed_results],[x[2] for x in parsed_results],[x[3] for x in parsed_results]]
return parsed_results
obras['lat'],obras['lon'],obras['id'],obras['nombre'] =
get_similar_bulk("provincias", obras['prov'])
but if we change the data after your code :
obras['lat'] = obras['prov_norm'].apply(lambda x: x['centroide']['lat'])
obras['lon'] = obras['prov_norm'].apply(lambda x: x['centroide']['lon'])
obras['id_obras'] = obras['prov_norm'].apply(lambda x: x['id'])
obras['nombre'] = obras['prov_norm'].apply(lambda x: x['nombre'])
obras = obras.drop('prov_norm', 1)
But I don't understand why your data for all the rows are the same. Is this a part of your problem too? Or is it just a sample code?
CodePudding user response:
If I'm not mistaken you could do the following with your current parsed_results
:
obras_prov_norm = pd.DataFrame(
{**d.pop('centroide', {}), **d}
for d in get_similar_bulk('provincias', obras['prov'])
)
obras = pd.concat([obras, obras_prov_norm], axis='columns')
CodePudding user response:
You can also try the pandas 'json_normalize' like below:
import pandas as pd
results = get_similar_bulk("provincias", obras['prov'])
df_results = pd.json_normalize(results)
print(df_results)