Home > Blockchain >  Pandas: How do I normalize a JSON file with multiple nested lists of JSON?
Pandas: How do I normalize a JSON file with multiple nested lists of JSON?

Time:10-02

I'm requesting a data from a API and then trying to normalize this JSON file, it has this structure

[{'la_id': '33',
  'store': '1405fdsa6001209',
  'sell': '110aa346',
  'products': [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'},
   {'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90'},
   {'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'},
   {'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}],
  'payment': {'paymentid': '10a836',
   'value': '259.6000',
   'number': '4',
   'finalid': '4',
   'finalname': 'Cartao de credito',
   'docs': '849763',
   'flag': None}}
   'pagamentos': [{'pagamento_id': '107795',
   'valor': '854.9900',
   'numero_parcelas': '10',
   'finalizador_id': '4',
   'finalizador_nome': 'Cartao de credito',
   'documento': '500003',
   'bandeira': 'MASTERCARD'}]

When I apply the JsonNormalize, in order to transform this into a dataframe, I'm getting this:

id store sell products pagamentos
33 1405fdsa6001209 110aa346 [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'}, {'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90'}, {'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'}, {'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}] [{'pagamento_id': '10aa95','valor': '84.9900','numero_parcelas': '10','finalizador_id': '4','finalizador_nome': 'Cartao de credito','docs': '500003','bandeira': 'MASTERCARD'}]

As you can see, the last 2 columns are not getting the values properly, they have dictionary inside a list. How can I fix this?

CodePudding user response:

Try:

lst = [
    {
        "la_id": "33",
        "store": "1405fdsa6001209",
        "sell": "110aa346",
        "products": [
            {"codigo": "176690", "lacre": "15980fd2293", "valor": "49.90"},
            {"codigo": "sd4907", "lacre": "1598a12385", "valor": "19.90"},
            {"codigo": "aa4907", "lacre": "1598a2384", "valor": "19.90"},
            {"codigo": "1fd307", "lacre": "1598a20401", "valor": "169.90"},
        ],
        "payment": {
            "paymentid": "10a836",
            "value": "259.6000",
            "number": "4",
            "finalid": "4",
            "finalname": "Cartao de credito",
            "docs": "849763",
            "flag": None,
        },
    }
]

df = pd.json_normalize(lst).explode("products")
df = pd.concat([df, df.pop("products").apply(pd.Series)], axis=1)
print(df)

Prints:

  la_id            store      sell payment.paymentid payment.value payment.number payment.finalid  payment.finalname payment.docs payment.flag  codigo        lacre   valor
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  176690  15980fd2293   49.90
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  sd4907   1598a12385   19.90
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  aa4907    1598a2384   19.90
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  1fd307   1598a20401  169.90

EDIT: With updated input:

df = pd.concat([df, df.pop("payments").apply(pd.Series)], axis=1)
df = df.explode("product")
df = pd.concat([df, df.pop("product").apply(pd.Series)], axis=1)
print(df)

Prints:

   id            store      sell payment_id    valor number finalid   finalizador_nome    docs        flag  codigo        lacre   valor
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  176690  15980fd2293   49.90
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  sd4907   1598a12385   19.90
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  aa4907    1598a2384   19.90
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  1fd307   1598a20401  169.90

CodePudding user response:

You can use pd.json_normalize() for each of:

  1. Extract the main fields (including key la_id)

  2. Extract the products details key la_id

  3. Extract the pagamentos details key la_id

Then, use .merge() to merge the 3 resultant dataframes using common key la_id, as follows:

j_lst = [{'la_id': '33',
          'store': '1405fdsa6001209',
          'sell': '110aa346',
          'products': [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'},
                       {'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90'},
                       {'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'},
                       {'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}],
          'payment': {'paymentid': '10a836',
                      'value': '259.6000',
                      'number': '4',
                      'finalid': '4',
                      'finalname': 'Cartao de credito',
                      'docs': '849763',
                      'flag': None},
          'pagamentos': [{'pagamento_id': '107795',
                          'valor': '854.9900',
                          'numero_parcelas': '10',
                          'finalizador_id': '4',
                          'finalizador_nome': 'Cartao de credito',
                          'documento': '500003',
                          'bandeira': 'MASTERCARD'}]}]


df_main = pd.json_normalize(j_lst)

df_products = pd.json_normalize(j_lst, record_path=['products'], record_prefix='products.', meta=['la_id'])

df_pagamentos = pd.json_normalize(j_lst, record_path=['pagamentos'], record_prefix='pagamentos.', meta=['la_id'])

df_out = (df_main.merge(df_products, on='la_id')
                 .merge(df_pagamentos, on='la_id')
                 .drop(['products', 'pagamentos'], axis=1)
         )

Result:

print(df_out)

  la_id            store      sell payment.paymentid payment.value payment.number payment.finalid  payment.finalname payment.docs payment.flag products.codigo products.lacre products.valor pagamentos.pagamento_id pagamentos.valor pagamentos.numero_parcelas pagamentos.finalizador_id pagamentos.finalizador_nome pagamentos.documento pagamentos.bandeira
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          176690    15980fd2293          49.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD
1    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          sd4907     1598a12385          19.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD
2    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          aa4907      1598a2384          19.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD
3    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          1fd307     1598a20401         169.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD
  • Related