Home > Enterprise >  How to access really nested JSON dicts?
How to access really nested JSON dicts?

Time:09-09

After obtaining the JSON data through data scraping a web, I'm trying to trasnform it as part of an ETL. But the problem is that the JSON is really nested. My objective is getting data in pandas table/df format:

pandas dataframe example

I already used json_normalize and read documentation but really couldn't make it work since, after creating the json_normalize, the dataframe looks like this:

json viewer

I think it will be better posting 2 more pics of a json viewer and the actual json, in case someone can help me out with this topic since it's driving me crazy:

json_normalize

{
     "group":3,
      "acc":"Yum.8039",
      "profession":"Mechanist",
      "isPoV":false,
      "isCommander":false,
      "l1Set":["Rifle"],
      "l2Set":[],
      "a1Set":[],
      "a2Set":[],
      "colTarget":"rgb(208,156,89)",
      "colCleave":"rgb(158,119,68)",
      "colTotal":"rgb(109,83,48)",
      "isFake":false,
      "notInSquad":false,
      "uniqueID":52,
      "name":"F L O R I � N",
      "tough":0,
      "condi":0,
      "conc":0,
      "heal":0,
      "icon":"https://wiki.guildwars2.com/images/thumb/8/8a/Mechanist_tango_icon_200px.png/20px-Mechanist_tango_icon_200px.png",
      "health":-1,
      "minions":[
          {
              "id":79,
              "name":"Jade Mech CJ-1"
          },
          {
              "id":122,"name":
              "Mark I Golem"}
      ],
      "details":{
          "dmgDistributions":[
              {
                  "contributedDamage":2881217,
                  "contributedBreakbarDamage":0.0,
                  "contributedShieldDamage":0,
                  "totalDamage":5174108,
                  "totalBreakbarDamage":300.0,
                  "totalCasting":163564,
                  "distribution":[
                      [
                          false,59562,58168,4291,10176,7,7,7,3,0,0.0,0.0,0,58168,7,0,0,0.0
                      ],
                      [
                          false,6003,738013,0,4372,164,240,235,157,0,1.663,1.193,0,733712,254,98611,0,0.0
                      ],
                      [
                          false,68121,397981,0,5747,0,96,96,64,0,0.0,0.0,0,397981,102,0,0,0.0
                      ],
                      [
                          true,736,107737,6,1622,0,176,0,0,0,0.0,0.0,0,0,196,0,0,0.0
                      ],
                      [
                          false,29889,229776,0,8299,0,37,37,23,0,0.0,0.0,0,229776,37,0,0,0.0
                      ],
                      [
                          false,6004,66664,3660,7118,12,12,12,8,0,0.0,0.0,0,66664,12,5780,0,0.0
                      ],
                      [
                          false,6154,109152,6475,14379,22,11,11,4,0,0.0,0.0,0,109152,11,6224,0,0.0
                      ]
                  ]
              }
          ]
      } 
} 

My objective is creating a dataframe as showed on the first pic containing players group, name, acc and dps from phase 1, 2 and 3 that can be found inside the full JSON (I just uploaded part of it as it is quite huge). The actual phase dmg is founded for all players in:

phase1_dps = data['phases'][1]['dpsStats'][0:10] 
phase2_dps = data['phases'][6]['dpsStats'][0:10] 
phase3_dps = data['phases'][12]['dpsStats'][0:10] 

But, this same dmg can be founded in ['details'][dmgDistributions][0]:

> "details":{
         "dmgDistributions":[
             {
                 "contributedDamage":2881217,
                 "contributedBreakbarDamage":0.0,
                 "contributedShieldDamage":0,
                 "totalDamage":5174108,
                 "totalBreakbarDamage":300.0,
                 "totalCasting":163564

The part I cannot access is the dmg, since in the second pic that column appear as detail.dmgDistribution

To summarize, the dataframe should be dsiplaying a frame like the one on the first pic but instead of tough, condi, conc and heal, it should show the dps done per phase on each player, as a rough sketch, dps should be like this:

sketch

Any tip or information would be great, I thought on just trying to get the info through JSON queries in case there isn't any other options. So, thanks in advance to anyone helping out!

CodePudding user response:

It is still not quite clear to me how the data you showed us lead to an output you want, but since you asked for some hints, here is how you might come forward with your goal.

Load the data from json:

with open('your_json.json') as f:
    data = json.load(f)

Then use pd.json_normalize to get this the columns ['id', 'name', 'group', 'acc', 'details.dmgDistributions']. The first two column are from record_path, the rest from meta.

out = pd.json_normalize(data, record_path=['minions'], meta = ['group', 'acc', ['details', 'dmgDistributions']])
out

Output:

    id            name group       acc  \
0   79  Jade Mech CJ-1     3  Yum.8039   
1  122    Mark I Golem     3  Yum.8039   

                            details.dmgDistributions  
0  {'contributedDamage': 2881217, 'contributedBre...  
1  {'contributedDamage': 2881217, 'contributedBre...  

Now you change the dicts in details.dmgDistributions to own columns and join it back to the current df out:

out.join(out.pop('details.dmgDistributions').apply(pd.Series))

Output:

    id            name group       acc  contributedDamage  contributedBreakbarDamage  contributedShieldDamage  totalDamage  totalBreakbarDamage  totalCasting  distribution
0   79  Jade Mech CJ-1     3  Yum.8039            2881217                        0.0                        0      5174108                300.0        163564  [[False, 59562, 58168, 4291...],...]
1  122    Mark I Golem     3  Yum.8039            2881217                        0.0                        0      5174108                300.0        163564  [[False, 59562, 58168, 4291...],...]

Don't know if you need to split all the row values in distribution but if you need to, have a look at pd.explode()

  • Related