I have a file with several dictionaries and I want to turn it into a pandas dataframe, but I can't. When I try to get the first value to be in a column, like "Fuel station 1" and everything else is in a second column all together.
{'Fuel station 1': {'00404850000317': {'01-01-2019': {'DataDaVenda': '2019-01-01 19:04:22',
'Descrição': 'GASOLINA COMUM',
'Preço': 4.289544},
'01-01-2020': {'DataDaVenda': '2020-01-01 19:18:09',
'Descrição': 'GASOLINA C COMUM (b:2)',
'Preço': 4.49},
'01-01-2022': {'DataDaVenda': '2021-12-31 19:24:20',
'Descrição': 'GASOLINA C COMUM (b:1)',
'Preço': 6.49}}},
{'Fuel station 2': {'00404850000317': {'01-01-2019': {'DataDaVenda': '2019-01-01 19:04:22',
'Descrição': 'GASOLINA COMUM',
'Preço': 4.289544},
'01-01-2021': {'DataDaVenda': '2021-01-01 18:48:55',
'Descrição': 'GASOLINA C COMUM (b:1)',
'Preço': 4.59},
'01-01-2022': {'DataDaVenda': '2021-12-31 19:24:20',
'Descrição': 'GASOLINA C COMUM (b:1)',
'Preço': 6.49}}}
Desired output format:
CodePudding user response:
Well your data is nested on multiple levels. So first of all you are going to have to transform it into a format that pandas can handle. One way would be a records format (list of dicts), where each of the keys which belong to multiple entries are their own fields:
import pandas
# slightly fixed your brackets
data = {
'Fuel station 1': {'00404850000317': {
'01-01-2019': {'DataDaVenda': '2019-01-01 19:04:22', 'Descrição': 'GASOLINA COMUM', 'Preço': 4.289544},
'01-01-2020': {'DataDaVenda': '2020-01-01 19:18:09', 'Descrição': 'GASOLINA C COMUM (b:2)', 'Preço': 4.49},
'01-01-2022': {'DataDaVenda': '2021-12-31 19:24:20', 'Descrição': 'GASOLINA C COMUM (b:1)', 'Preço': 6.49}}},
'Fuel station 2': {'00404850000317': {
'01-01-2019': {'DataDaVenda': '2019-01-01 19:04:22', 'Descrição': 'GASOLINA COMUM', 'Preço': 4.289544},
'01-01-2021': {'DataDaVenda': '2021-01-01 18:48:55', 'Descrição': 'GASOLINA C COMUM (b:1)', 'Preço': 4.59},
'01-01-2022': {'DataDaVenda': '2021-12-31 19:24:20', 'Descrição': 'GASOLINA C COMUM (b:1)', 'Preço': 6.49}}}}
To flat list of dicts:
reformatted_data = []
for fuel_st, v in data.items():
for id_, v_ in v.items():
for date, v__ in v_.items():
reformatted_data.append({"fuel station": fuel_st, "id": id_, "date": date})
for k___, v___ in v__.items():
reformatted_data[-1][k___] = v___
df = pandas.DataFrame.from_records(reformatted_data)
Which returns:
print(df)
> fuel station id date DataDaVenda Descrição Preço
0 Fuel station 1 00404850000317 01-01-2019 2019-01-01 19:04:22 GASOLINA COMUM 4.289544
1 Fuel station 1 00404850000317 01-01-2020 2020-01-01 19:18:09 GASOLINA C COMUM (b:2) 4.490000
2 Fuel station 1 00404850000317 01-01-2022 2021-12-31 19:24:20 GASOLINA C COMUM (b:1) 6.490000
3 Fuel station 2 00404850000317 01-01-2019 2019-01-01 19:04:22 GASOLINA COMUM 4.289544
4 Fuel station 2 00404850000317 01-01-2021 2021-01-01 18:48:55 GASOLINA C COMUM (b:1) 4.590000
5 Fuel station 2 00404850000317 01-01-2022 2021-12-31 19:24:20 GASOLINA C COMUM (b:1) 6.490000