I have this JSON:
{'57333310':
{'3179138607':
{'clicks': 0,
'spent': 0,
'cpc': 0,
'qualityFactor': 1.5},
'0':
{'clicks': 0,
'spent': 0,
'cpc': 0,
'qualityFactor': 1.5}},
'57335920':
{'4001223651':
{'clicks': 0,
'spent': 0,
'cpc': 0,
'qualityFactor': 1.5},
'896887695':
{'clicks': 0,
'spent': 0,
'cpc': 0,
'qualityFactor': 1.5},
'0':
{'clicks': 0,
'spent': 0,
'cpc': 0,
'qualityFactor': 1.5}}}
And none of the methods I tested (excluding nested loops iterating over dictionaries) returned a table like this:
key_1 key_2 clicks spent cpc qualityFactor
0 57333310 3179138607 0 0.0 0.00 1.5
1 57333310 0 0 0.0 0.00 1.5
2 57335920 4001223651 0 0.0 0.00 1.5
3 57335920 896887695 0 0.0 0.00 1.5
4 57335920 0 0 0.0 0.00 1.5
I have tried this code:
# d is JSON dictionary
df = pd.DataFrame()
for key_1 in d.keys():
for key_2 in d[key_1].keys():
df = df.append(pd.DataFrame.from_dict(d[key_1][key_2], orient = 'index').T.join(
pd.DataFrame({'key_1':[key_1], 'key_2':[key_2]})), ignore_index = True)
df
returns
clicks spent cpc qualityFactor key_1 key_2
0 0.0 0.0 0.0 1.5 57333310 3179138607
1 0.0 0.0 0.0 1.5 57333310 0
2 0.0 0.0 0.0 1.5 57335920 4001223651
3 0.0 0.0 0.0 1.5 57335920 896887695
4 0.0 0.0 0.0 1.5 57335920 0
and this:
for key_1 in d.keys():
for key_2 in d[key_1].keys():
d[key_1][key_2].update({'key_1': key_1})
d[key_1][key_2].update({'key_2': key_2})
r = []
for l in list([list(x.values()) for x in list(d.values())]):
r = l
df = pd.DataFrame(r)[['key_1','key_2','clicks','spent','cpc','qualityFactor']]
df
returns
key_1 key_2 clicks spent cpc qualityFactor
0 57333310 3179138607 0 0 0 1.5
1 57333310 0 0 0 0 1.5
2 57335920 4001223651 0 0 0 1.5
3 57335920 896887695 0 0 0 1.5
4 57335920 0 0 0 0 1.5
Both ways work, but 1-st the code should not have nested loops, 2-nd each level of nesting will have to be processed with a new nested loop, which makes the code not universal for any nesting, because a new cycle needs to be completed additionally with hands.
The task is to get such a data frame without resorting to the use of cycles (at least nested ones). Does anyone have the solution?
CodePudding user response:
Use list comprehension with convert keys to helper dict, last pass to DataFrame
constructor is fastest solution, especially if many nested dicts:
L = [{**{'key_1':k1, 'key_2':k2}, **v2} for k1, v1 in d.items() for k2, v2 in v1.items()]
df = pd.DataFrame(L)
print (df)
key_1 key_2 clicks spent cpc qualityFactor
0 57333310 3179138607 0 0 0 1.5
1 57333310 0 0 0 0 1.5
2 57335920 4001223651 0 0 0 1.5
3 57335920 896887695 0 0 0 1.5
4 57335920 0 0 0 0 1.5