Home > other >  Is there a way to unnest a json´column in a dataframe?
Is there a way to unnest a json´column in a dataframe?

Time:11-30

I have this data set

obras = pd.DataFrame(
    [['1','Entre Ríos'],['2', 'Entre Rios'],['3','entre Ríos'], ['4','Entre ríos']],
    columns = ['id', 'prov']
)

In order to normalize the names of "Prov" I call an API

API_BASE_URL = "https://apis.datos.gob.ar/georef/api/"

def get_similar_bulk(endpoint, nombres):

    # realiza consulta a la API
    data = {
        endpoint: [
            {"nombre": nombre, "max": 1} for nombre in nombres
    ]}
    url = API_BASE_URL   endpoint
    results = requests.post(
        url, json=data, headers={"Content-Type": "application/json"}
    ).json()

    # convierte a una lista de "resultado más probable" o "vacío" cuando no hay
    parsed_results = [
        single_result[endpoint][0] if single_result[endpoint] else {}
        for single_result in results["resultados"]
    ]

    return parsed_results

obras['prov_norm'] = get_similar_bulk("provincias", obras['prov'])

As a result i get

id prov prov_norm
1 Entre Ríos {'centroide': {'lat': -32.0588735436448, 'lon': -59.2014475514635}, 'id': '30', 'nombre': 'Entre Ríos'}
2 Entre Rios {'centroide': {'lat': -32.0588735436448, 'lon': -59.2014475514635}, 'id': '30', 'nombre': 'Entre Ríos'}
3 entre Ríos {'centroide': {'lat': -32.0588735436448, 'lon': -59.2014475514635}, 'id': '30', 'nombre': 'Entre Ríos'}
4 Entre ríos {'centroide': {'lat': -32.0588735436448, 'lon': -59.2014475514635}, 'id': '30', 'nombre': 'Entre Ríos'}

The expected result would be:

id prov lat lon id nombre
1 Entre Ríos -32.0588735436448 -59.2014475514635 30 Entre Ríos
2 Entre Rios -32.0588735436448 -59.2014475514635 30 Entre Ríos
3 entre Ríos -32.0588735436448 -59.2014475514635 30 Entre Ríos
4 Entre ríos -32.0588735436448 -59.2014475514635 30 Entre Ríos

How I can solve this problem?

thanks!!!

CodePudding user response:

if you want sperate column then you must put data in sperate column like this:

in get_similar_bulk function:

# convierte a una lista de "resultado más probable" o "vacío" cuando no hay
  parsed_results = [
      single_result[endpoint][0] if single_result[endpoint] else {}
      for single_result in results["resultados"]
  ]

  parsed_results = [{'lat':x['centroide']['lat'],'lon': x['centroide']['lon'],'id_obras': x['id'],'nombre':x['nombre']} for x in parsed_results ]
  parsed_results=[[x[0] for x in parsed_results],[x[1] for x in parsed_results],[x[2] for x in parsed_results],[x[3] for x in parsed_results]]

  return parsed_results
obras['lat'],obras['lon'],obras['id'],obras['nombre'] = 
 get_similar_bulk("provincias", obras['prov'])

but if we change the data after your code :

obras['lat'] = obras['prov_norm'].apply(lambda x:  x['centroide']['lat'])
obras['lon'] = obras['prov_norm'].apply(lambda x:  x['centroide']['lon'])
obras['id_obras'] = obras['prov_norm'].apply(lambda x:  x['id'])
obras['nombre'] = obras['prov_norm'].apply(lambda x:  x['nombre'])


obras = obras.drop('prov_norm', 1)

But I don't understand why your data for all the rows are the same. Is this a part of your problem too? Or is it just a sample code?

CodePudding user response:

If I'm not mistaken you could do the following with your current parsed_results:

obras_prov_norm = pd.DataFrame(
    {**d.pop('centroide', {}), **d}
    for d in get_similar_bulk('provincias', obras['prov'])
)
obras = pd.concat([obras, obras_prov_norm], axis='columns')

CodePudding user response:

You can also try the pandas 'json_normalize' like below:

import pandas as pd

results = get_similar_bulk("provincias", obras['prov'])

df_results = pd.json_normalize(results)

print(df_results)
  • Related