Home > Software design >  Put another fields with the nested JSON structure to pandas dataframe python
Put another fields with the nested JSON structure to pandas dataframe python

Time:08-12

I am trying to put other fields from the json after normalize the nested part. I had followed some examples like I need but I don't understand why the error : KeyError: 'phone Number'

In others words I need put another fields from the json in order to complet the neested fields and get a better structure in my panda dataframe

This is my code :

import json
import pandas as pd

data ={
  "consumer": {
    "phoneNumber": "3156578877",
    "channelId": "83",
    "appId": "APP_DAVIPLATA",
    "moduleId": "MA_PSE_VNZ",
    "sessionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "appVersion": "16.1.1",
    "soVersion": "Android 11",
    "agentInfo": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36",
    "ipDevice": "163.111.221.230"
  },
  "transactionHeader": {
    "transactionType": "LOG",
    "transactionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "transactionDate": "2018-07-03T17:54:36.762-05:00"
  },
  "transactionDetail": {
    "logType": "ANALITICA_OPERACIONAL",
    "MediaTarjetId": "PSE_VNZ_1",
    "Consumer": {
      "phoneNumber": "3156578877",
      "channelId": "83",
      "appId": "APP_DAVIPLATA",
      "moduleId": "MA_PSE_VNZ",
      "sessionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "appVersion": "16.1.1",
      "soVersion": "Android 11",
      "agentInfo": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36",
      "ipDevice": "163.111.221.230"
    },
    "Transaction": {
      "transactionType": "LOG",
      "transactionId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "transactionDate": "2018-07-03T17:54:36.762-05:00"
    },
    "Client": {
      "identificationType": "CC",
      "identificationNumber": "1027868487",
      "documentExpeditionDate": "2009-05-10",
      "documentExpeditionPlace": "Bogota"
    },
    "Product": {
      "productCode": "DVP_CO",
      "productNumber": "3158765639"
    },
    "Messages": {
      "OperationalAnalytics": [
        {
          "nameField": "fecha_transaccion",
          "valueField": "2022071518:35:50",
          "valueFormat": "YYYYMMDDHH:MM:SS"
        },
        {
          "nameField": "nombre_transaccion",
          "valueField": "DEBITO PAGO",
          "valueFormat": "String"
        },
        {
          "nameField": "valor",
          "valueField": "5300",
          "valueFormat": "Number"
        },
        {
          "nameField": "referencia_destino",
          "valueField": "3156547865",
          "valueFormat": "String"
        }
      ]
    }
  }
}

pd.json_normalize(data, record_path = ['transactionDetail','Messages','OperationalAnalytics'],meta = [['consumer','phoneNumber'], 'transactionHeader'])

KeyError: 'phoneNumber'

In this code some like I need it worked

import pandas as pd
data = [
    {
        "company": "Google",
        "tagline": "Dont be evil",
        "management": {"CEO": "Sundar Pichai"},
        "department": [
            {"name": "Gmail", "revenue (bn)": 123},
            {"name": "GCP", "revenue (bn)": 400},
            {"name": "Google drive", "revenue (bn)": 600},
        ],
    },
    {
        "company": "Microsoft",
        "tagline": "Be What's Next",
        "management": {"CEO": "Satya Nadella"},
        "department": [
            {"name": "Onedrive", "revenue (bn)": 13},
            {"name": "Azure", "revenue (bn)": 300},
            {"name": "Microsoft 365", "revenue (bn)": 300},
        ],
    },
  
]
result = pd.json_normalize(
    data, "department", ["company", "tagline", ["management", "CEO"]]
)
result

Thanks for your help

CodePudding user response:

The json in your working example is a list of dictionaries while the one you are trying to transform is a dictionary.

You can access the dictionary elements directly with data[key] and convert them to dataframes that you will concatenate... One tricky part of your example are the keys in 'OperationalAnalytics'. I guess you would want the nameField values to be column names so that you get one line per transaction. This can be achieved by setting the nameField column to index and transposing:

pd.concat([pd.DataFrame([data["consumer"]])["phoneNumber"],
            pd.DataFrame([data["transactionHeader"]]),
            pd.DataFrame(data['transactionDetail']['Messages']['OperationalAnalytics']
                        ).set_index('nameField')[['valueField']].T.reset_index(drop=True)
          ], axis=1)

Output:

  phoneNumber transactionType                         transactionId                transactionDate fecha_transaccion nombre_transaccion valor referencia_destino
0  3156578877             LOG  3fa85f64-5717-4562-b3fc-2c963f66afa6  2018-07-03T17:54:36.762-05:00  2022071518:35:50        DEBITO PAGO  5300         3156547865

CodePudding user response:

Imagine some thing like this :

Output:

phoneNumber channelId appId moduleId sessionId appVersion soVersion agentInfo ipDevice nameField valueField 0 3156578877 83 APP_DAVIPLATA MA_PSE_VNZ 3fa85f64-5717-4562-b3fc-2c963f66afa6 16.1.1 Android 11 Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl... 163.111.221.230 referencia_destino 3156547865 1 3156578877 83 APP_DAVIPLATA MA_PSE_VNZ 3fa85f64-5717-4562-b3fc-2c963f66afa6 16.1.1 Android 11 Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl... 163.111.221.230 valor 5300 2 3156578877 83 APP_DAVIPLATA MA_PSE_VNZ 3fa85f64-5717-4562-b3fc-2c963f66afa6 16.1.1 Android 11 Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl... 163.111.221.230 nombre_transaccion DEBITO PAGO 3 3156578877 83 APP_DAVIPLATA MA_PSE_VNZ 3fa85f64-5717-4562-b3fc-2c963f66afa6 16.1.1 Android 11 Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl... 163.111.221.230 fecha_transaccion 2022071518:35:50

  • Related