Nested JSON with unique keys to flatten DataFrame with first 2 levels keys in columns-CodePudding

I have this JSON:

{'57333310': 
   {'3179138607': 
       {'clicks': 0,
        'spent': 0,
        'cpc': 0,
        'qualityFactor': 1.5},
    '0': 
       {'clicks': 0,
        'spent': 0,
        'cpc': 0,
        'qualityFactor': 1.5}},
 '57335920': 
    {'4001223651': 
      {'clicks': 0,
       'spent': 0,
       'cpc': 0,
       'qualityFactor': 1.5},
  '896887695':
      {'clicks': 0,
       'spent': 0,
       'cpc': 0,
       'qualityFactor': 1.5},
  '0':
      {'clicks': 0,
       'spent': 0,
       'cpc': 0,
       'qualityFactor': 1.5}}}

And none of the methods I tested (excluding nested loops iterating over dictionaries) returned a table like this:

        key_1       key_2       clicks  spent   cpc     qualityFactor
0       57333310    3179138607  0       0.0     0.00    1.5
1       57333310    0           0       0.0     0.00    1.5
2       57335920    4001223651  0       0.0     0.00    1.5
3       57335920    896887695   0       0.0     0.00    1.5
4       57335920    0           0       0.0     0.00    1.5

I have tried this code:

# d is JSON dictionary

df = pd.DataFrame()
for key_1 in d.keys():
    for key_2 in d[key_1].keys():
        df = df.append(pd.DataFrame.from_dict(d[key_1][key_2], orient = 'index').T.join(
            pd.DataFrame({'key_1':[key_1], 'key_2':[key_2]})), ignore_index = True)
df

returns

    clicks  spent   cpc     qualityFactor   key_1   key_2
0   0.0     0.0     0.0     1.5             57333310        3179138607
1   0.0     0.0     0.0     1.5             57333310        0
2   0.0     0.0     0.0     1.5             57335920        4001223651
3   0.0     0.0     0.0     1.5             57335920        896887695
4   0.0     0.0     0.0     1.5             57335920        0

and this:

for key_1 in d.keys():
    for key_2 in d[key_1].keys():
        d[key_1][key_2].update({'key_1': key_1})
        d[key_1][key_2].update({'key_2': key_2})

r = []
for l in list([list(x.values()) for x in list(d.values())]):
    r  = l
df = pd.DataFrame(r)[['key_1','key_2','clicks','spent','cpc','qualityFactor']]
df

returns

    key_1       key_2       clicks  spent   cpc qualityFactor
0   57333310    3179138607  0       0       0   1.5
1   57333310    0           0       0       0   1.5
2   57335920    4001223651  0       0       0   1.5
3   57335920    896887695   0       0       0   1.5
4   57335920    0           0       0       0   1.5

Both ways work, but 1-st the code should not have nested loops, 2-nd each level of nesting will have to be processed with a new nested loop, which makes the code not universal for any nesting, because a new cycle needs to be completed additionally with hands.

The task is to get such a data frame without resorting to the use of cycles (at least nested ones). Does anyone have the solution?

CodePudding user response：

Use list comprehension with convert keys to helper dict, last pass to DataFrame constructor is fastest solution, especially if many nested dicts:

L = [{**{'key_1':k1, 'key_2':k2}, **v2} for k1, v1 in d.items() for k2, v2 in v1.items()]
df = pd.DataFrame(L)
print (df)

      key_1       key_2  clicks  spent  cpc  qualityFactor
0  57333310  3179138607       0      0    0            1.5
1  57333310           0       0      0    0            1.5
2  57335920  4001223651       0      0    0            1.5
3  57335920   896887695       0      0    0            1.5
4  57335920           0       0      0    0            1.5