I have a large json dataset of the form:
r_json =
[[{'gasto': 3.47},
{'interacciones': 2.0},
{'fecha': 'Tue, 15 Oct 2019 00:00:00 GMT'},
{'moneda': 'USD'},
{'id_campania': '6666070697'},
{'id_conjunto_anuncios': '76484749901'},
{'id_anuncio': '389396738140'},
{'visitas_al_50': 862.6176},
{'visitas_al_75': 651.1752},
{'visitas_al100': 578.9394},
{'visitas_al100': 'YouTube'},
{'country': 'PE'},
{'brand': 'PILSEN CALLAO'},
{'objetivo': 'Video Views'},
{'objetivo': 'Mon, 14 Oct 2019 00:00:00 GMT'},
{'formato': 'Video'},
{'nombre_campania': 'LITRAZO'},
{'impresiones': 2106.0}],
[{'gasto': 4.49},
{'interacciones': 3.0},
{'fecha': 'Fri, 15 Nov 2019 00:00:00 GMT'},
{'moneda': 'USD'},
{'id_campania': '213904'},
{'id_conjunto_anuncios': '1905221'},
{'id_anuncio': '4460317'},
{'visitas_al_50': 791.0},
{'visitas_al_75': 732.0}, (...)
I want to convert it to a pandas dataframe. I have tried:
df = pd.DataFrame (r_json, columns = ['gasto', 'interacciones', 'fecha', 'moneda', 'id_campania',
'id_conjunto_anuncios', 'id_anuncio', 'visitas_al_50', 'visitas_al_75',
'visitas_al100', 'visitas_al100_2', 'country', 'brand', 'objetivo', 'objetivo_2',
'formato', 'nombre_campania', 'impresiones'])
But I am getting a dataframe that seems to contain dictionaries:
gasto interacciones fecha moneda id_campania id_conjunto_anuncios id_anuncio visitas_al_50 visitas_al_75 visitas_al100 visitas_al100_2 country brand objetivo objetivo_2 formato nombre_campania impresiones
0 {'gasto': 3.47} {'interacciones': 2.0} {'fecha': 'Tue, 15 Oct 2019 00:00:00 GMT'} {'moneda': 'USD'} {'id_campania': '6666070697'} {'id_conjunto_anuncios': '76484749901'} {'id_anuncio': '389396738140'} {'visitas_al_50': 862.6176} {'visitas_al_75': 651.1752} {'visitas_al100': 578.9394} {'visitas_al100': 'YouTube'} {'country': 'PE'} {'brand': 'PILSEN CALLAO'} {'objetivo': 'Video Views'} {'objetivo': 'Mon, 14 Oct 2019 00:00:00 GMT'} {'formato': 'Video'} {'nombre_campania': 'LITRAZO'} {'impresiones': 2106.0}
1 {'gasto': 4.49} {'interacciones': 3.0} {'fecha': 'Fri, 15 Nov 2019 00:00:00 GMT'} {'moneda': 'USD'} {'id_campania': '213904'} {'id_conjunto_anuncios': '1905221'} {'id_anuncio': '4460317'} {'visitas_al_50': 791.0} {'visitas_al_75': 732.0} {'visitas_al100': 699.0} {'visitas_al100': 'Oath'} {'country': 'PE'} {'brand': 'PILSEN CALLAO'} {'objetivo': 'Video Views'} {'objetivo': 'Wed, 13 Nov 2019 00:00:00 GMT'} {'formato': 'Video'} {'nombre_campania': 'LITRAZO'} {'impresiones': 832.0}
I have tried many things but haven't succeeded. Could anyone give me some guidance? Thanks in advance.
CodePudding user response:
You need to manipulate your JSON a bit before calling the DataFrame constructor:
tmp = [
dict(kv for dict_ in item for kv in dict_.items())
for item in r_json
]
df = pd.DataFrame(tmp)
Result:
gasto interacciones fecha moneda id_campania id_conjunto_anuncios id_anuncio visitas_al_50 visitas_al_75 visitas_al100 country brand objetivo formato nombre_campania impresiones
0 3.47 2.0 Tue, 15 Oct 2019 00:00:00 GMT USD 6666070697 76484749901 389396738140 862.6176 651.1752 YouTube PE PILSEN CALLAO Mon, 14 Oct 2019 00:00:00 GMT Video LITRAZO 2106.0
1 4.49 3.0 Fri, 15 Nov 2019 00:00:00 GMT USD 213904 1905221 4460317 791.0000 732.0000 NaN NaN NaN NaN NaN NaN NaN
How it works
If you pass a list of lists (i.e. 2D list) to the DataFrame constructor, it assumes that the elements are stored in row-by-column manner:
r_json = [
[
{'gasto': 3.47}, # first row, first column = {'gasto': 3.47}
{'interacciones': 2.0} # first row, second column = {'interacciones': 2.0}
],
[
{'gasto': 4.49}, # second row, first column = {'gasto': 4.49}
{'interacciones': 3.0} # second row, second column = {'interacciones': 3.0}
]
]
df = pd.DataFrame(r_json)
If you pass in a list of dictionaries instead, it assumes each item represents a row and each key-value pair represents a column:
tmp = [
{
'gasto': 3.47, # first row, column gasto = 3.47
'interacciones': 2.0 # first row, column interacciones = 2.0
},
{
'gasto': 4.49,
'interacciones': 3.0
}
]
df = pd.DataFrame(tmp)
So we need to convert the original JSON to the second form by merging the key-value pairs:
dict1 = dict([('gasto', 3.47), ('interacciones', 2.0)])
dict2 = dict([('gasto', 4.49), ('interacciones', 3.0)])
tmp = [dict1, dict2]
The list comprehension is just to apply that transformation on the whole r_json
list.