I'm requestin a data from a API and then trying to normalize this JSON file, it has this structure
[{'la_id': '33',
'store': '1405fdsa6001209',
'sell': '110aa346',
'products': [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'},
{'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90'},
{'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'},
{'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}],
'payment': {'paymentid': '10a836',
'value': '259.6000',
'number': '4',
'finalid': '4',
'finalname': 'Cartao de credito',
'docs': '849763',
'flag': None}}
'pagamentos': [{'pagamento_id': '107795',
'valor': '854.9900',
'numero_parcelas': '10',
'finalizador_id': '4',
'finalizador_nome': 'Cartao de credito',
'documento': '500003',
'bandeira': 'MASTERCARD'}]
When I apply the JsonNormalize, in order to transform this into a dataframe, I'm getting this:
id | store | sell | product | payments |
---|---|---|---|---|
33 | 1405fdsa6001209 | 110aa346 | [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'}, {'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90'}, {'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'}, {'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}] | {'payment_id': '10aa95','valor': '84.9900','number': '10','finalid': '4','finalizador_nome': 'Cartao de credito','docs': '500003','flag': 'MASTERCARD'} |
As you can see, the last collumn is not getting the values properly, it has a dictionary inside it. How Can I fix this?
CodePudding user response:
Try:
lst = [
{
"la_id": "33",
"store": "1405fdsa6001209",
"sell": "110aa346",
"products": [
{"codigo": "176690", "lacre": "15980fd2293", "valor": "49.90"},
{"codigo": "sd4907", "lacre": "1598a12385", "valor": "19.90"},
{"codigo": "aa4907", "lacre": "1598a2384", "valor": "19.90"},
{"codigo": "1fd307", "lacre": "1598a20401", "valor": "169.90"},
],
"payment": {
"paymentid": "10a836",
"value": "259.6000",
"number": "4",
"finalid": "4",
"finalname": "Cartao de credito",
"docs": "849763",
"flag": None,
},
}
]
df = pd.json_normalize(lst).explode("products")
df = pd.concat([df, df.pop("products").apply(pd.Series)], axis=1)
print(df)
Prints:
la_id store sell payment.paymentid payment.value payment.number payment.finalid payment.finalname payment.docs payment.flag codigo lacre valor
0 33 1405fdsa6001209 110aa346 10a836 259.6000 4 4 Cartao de credito 849763 None 176690 15980fd2293 49.90
0 33 1405fdsa6001209 110aa346 10a836 259.6000 4 4 Cartao de credito 849763 None sd4907 1598a12385 19.90
0 33 1405fdsa6001209 110aa346 10a836 259.6000 4 4 Cartao de credito 849763 None aa4907 1598a2384 19.90
0 33 1405fdsa6001209 110aa346 10a836 259.6000 4 4 Cartao de credito 849763 None 1fd307 1598a20401 169.90
EDIT: With updated input:
df = pd.concat([df, df.pop("payments").apply(pd.Series)], axis=1)
df = df.explode("product")
df = pd.concat([df, df.pop("product").apply(pd.Series)], axis=1)
print(df)
Prints:
id store sell payment_id valor number finalid finalizador_nome docs flag codigo lacre valor
0 33 1405fdsa6001209 110aa346 10aa95 84.9900 10 4 Cartao de credito 500003 MASTERCARD 176690 15980fd2293 49.90
0 33 1405fdsa6001209 110aa346 10aa95 84.9900 10 4 Cartao de credito 500003 MASTERCARD sd4907 1598a12385 19.90
0 33 1405fdsa6001209 110aa346 10aa95 84.9900 10 4 Cartao de credito 500003 MASTERCARD aa4907 1598a2384 19.90
0 33 1405fdsa6001209 110aa346 10aa95 84.9900 10 4 Cartao de credito 500003 MASTERCARD 1fd307 1598a20401 169.90
CodePudding user response:
You can use pd.json_normalize()
for each of:
Extract the main fields
Extract the
products
detailsExtract the
pagamentos
details
Then, use .merge()
to merge the 3 resultant dataframes using common key la_id
, as follows:
j_lst = [{'la_id': '33',
'store': '1405fdsa6001209',
'sell': '110aa346',
'products': [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'},
{'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90'},
{'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'},
{'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}],
'payment': {'paymentid': '10a836',
'value': '259.6000',
'number': '4',
'finalid': '4',
'finalname': 'Cartao de credito',
'docs': '849763',
'flag': None},
'pagamentos': [{'pagamento_id': '107795',
'valor': '854.9900',
'numero_parcelas': '10',
'finalizador_id': '4',
'finalizador_nome': 'Cartao de credito',
'documento': '500003',
'bandeira': 'MASTERCARD'}]}]
df_main = pd.json_normalize(j_lst)
df_products = pd.json_normalize(j_lst, record_path=['products'], record_prefix='products.', meta=['la_id'])
df_pagamentos = pd.json_normalize(j_lst, record_path=['pagamentos'], record_prefix='pagamentos.', meta=['la_id'])
df_out = (df_main.merge(df_products, on='la_id')
.merge(df_pagamentos, on='la_id')
.drop(['products', 'pagamentos'], axis=1)
)
Result:
print(df_out)
la_id store sell payment.paymentid payment.value payment.number payment.finalid payment.finalname payment.docs payment.flag products.codigo products.lacre products.valor pagamentos.pagamento_id pagamentos.valor pagamentos.numero_parcelas pagamentos.finalizador_id pagamentos.finalizador_nome pagamentos.documento pagamentos.bandeira
0 33 1405fdsa6001209 110aa346 10a836 259.6000 4 4 Cartao de credito 849763 None 176690 15980fd2293 49.90 107795 854.9900 10 4 Cartao de credito 500003 MASTERCARD
1 33 1405fdsa6001209 110aa346 10a836 259.6000 4 4 Cartao de credito 849763 None sd4907 1598a12385 19.90 107795 854.9900 10 4 Cartao de credito 500003 MASTERCARD
2 33 1405fdsa6001209 110aa346 10a836 259.6000 4 4 Cartao de credito 849763 None aa4907 1598a2384 19.90 107795 854.9900 10 4 Cartao de credito 500003 MASTERCARD
3 33 1405fdsa6001209 110aa346 10a836 259.6000 4 4 Cartao de credito 849763 None 1fd307 1598a20401 169.90 107795 854.9900 10 4 Cartao de credito 500003 MASTERCARD