After obtaining the JSON data through data scraping a web, I'm trying to trasnform it as part of an ETL. But the problem is that the JSON is really nested. My objective is getting data in pandas table/df format:
I already used json_normalize and read documentation but really couldn't make it work since, after creating the json_normalize, the dataframe looks like this:
I think it will be better posting 2 more pics of a json viewer and the actual json, in case someone can help me out with this topic since it's driving me crazy:
{
"group":3,
"acc":"Yum.8039",
"profession":"Mechanist",
"isPoV":false,
"isCommander":false,
"l1Set":["Rifle"],
"l2Set":[],
"a1Set":[],
"a2Set":[],
"colTarget":"rgb(208,156,89)",
"colCleave":"rgb(158,119,68)",
"colTotal":"rgb(109,83,48)",
"isFake":false,
"notInSquad":false,
"uniqueID":52,
"name":"F L O R I � N",
"tough":0,
"condi":0,
"conc":0,
"heal":0,
"icon":"https://wiki.guildwars2.com/images/thumb/8/8a/Mechanist_tango_icon_200px.png/20px-Mechanist_tango_icon_200px.png",
"health":-1,
"minions":[
{
"id":79,
"name":"Jade Mech CJ-1"
},
{
"id":122,"name":
"Mark I Golem"}
],
"details":{
"dmgDistributions":[
{
"contributedDamage":2881217,
"contributedBreakbarDamage":0.0,
"contributedShieldDamage":0,
"totalDamage":5174108,
"totalBreakbarDamage":300.0,
"totalCasting":163564,
"distribution":[
[
false,59562,58168,4291,10176,7,7,7,3,0,0.0,0.0,0,58168,7,0,0,0.0
],
[
false,6003,738013,0,4372,164,240,235,157,0,1.663,1.193,0,733712,254,98611,0,0.0
],
[
false,68121,397981,0,5747,0,96,96,64,0,0.0,0.0,0,397981,102,0,0,0.0
],
[
true,736,107737,6,1622,0,176,0,0,0,0.0,0.0,0,0,196,0,0,0.0
],
[
false,29889,229776,0,8299,0,37,37,23,0,0.0,0.0,0,229776,37,0,0,0.0
],
[
false,6004,66664,3660,7118,12,12,12,8,0,0.0,0.0,0,66664,12,5780,0,0.0
],
[
false,6154,109152,6475,14379,22,11,11,4,0,0.0,0.0,0,109152,11,6224,0,0.0
]
]
}
]
}
}
My objective is creating a dataframe as showed on the first pic containing players group, name, acc and dps from phase 1, 2 and 3 that can be found inside the full JSON (I just uploaded part of it as it is quite huge). The actual phase dmg is founded for all players in:
phase1_dps = data['phases'][1]['dpsStats'][0:10]
phase2_dps = data['phases'][6]['dpsStats'][0:10]
phase3_dps = data['phases'][12]['dpsStats'][0:10]
But, this same dmg can be founded in ['details'][dmgDistributions][0]:
> "details":{
"dmgDistributions":[
{
"contributedDamage":2881217,
"contributedBreakbarDamage":0.0,
"contributedShieldDamage":0,
"totalDamage":5174108,
"totalBreakbarDamage":300.0,
"totalCasting":163564
The part I cannot access is the dmg, since in the second pic that column appear as detail.dmgDistribution
To summarize, the dataframe should be dsiplaying a frame like the one on the first pic but instead of tough, condi, conc and heal, it should show the dps done per phase on each player, as a rough sketch, dps should be like this:
Any tip or information would be great, I thought on just trying to get the info through JSON queries in case there isn't any other options. So, thanks in advance to anyone helping out!
CodePudding user response:
It is still not quite clear to me how the data you showed us lead to an output you want, but since you asked for some hints, here is how you might come forward with your goal.
Load the data from json:
with open('your_json.json') as f:
data = json.load(f)
Then use pd.json_normalize
to get this the columns ['id', 'name', 'group', 'acc', 'details.dmgDistributions']
. The first two column are from record_path, the rest from meta.
out = pd.json_normalize(data, record_path=['minions'], meta = ['group', 'acc', ['details', 'dmgDistributions']])
out
Output:
id name group acc \
0 79 Jade Mech CJ-1 3 Yum.8039
1 122 Mark I Golem 3 Yum.8039
details.dmgDistributions
0 {'contributedDamage': 2881217, 'contributedBre...
1 {'contributedDamage': 2881217, 'contributedBre...
Now you change the dicts in details.dmgDistributions
to own columns and join it back to the current df out
:
out.join(out.pop('details.dmgDistributions').apply(pd.Series))
Output:
id name group acc contributedDamage contributedBreakbarDamage contributedShieldDamage totalDamage totalBreakbarDamage totalCasting distribution
0 79 Jade Mech CJ-1 3 Yum.8039 2881217 0.0 0 5174108 300.0 163564 [[False, 59562, 58168, 4291...],...]
1 122 Mark I Golem 3 Yum.8039 2881217 0.0 0 5174108 300.0 163564 [[False, 59562, 58168, 4291...],...]
Don't know if you need to split all the row values in distribution
but if you need to, have a look at pd.explode()