Home > Blockchain >  Nested JSON with unique keys to flatten DataFrame with first 2 levels keys in columns
Nested JSON with unique keys to flatten DataFrame with first 2 levels keys in columns

Time:11-25

I have this JSON:

{'57333310': 
   {'3179138607': 
       {'clicks': 0,
        'spent': 0,
        'cpc': 0,
        'qualityFactor': 1.5},
    '0': 
       {'clicks': 0,
        'spent': 0,
        'cpc': 0,
        'qualityFactor': 1.5}},
 '57335920': 
    {'4001223651': 
      {'clicks': 0,
       'spent': 0,
       'cpc': 0,
       'qualityFactor': 1.5},
  '896887695':
      {'clicks': 0,
       'spent': 0,
       'cpc': 0,
       'qualityFactor': 1.5},
  '0':
      {'clicks': 0,
       'spent': 0,
       'cpc': 0,
       'qualityFactor': 1.5}}}

And none of the methods I tested (excluding nested loops iterating over dictionaries) returned a table like this:

        key_1       key_2       clicks  spent   cpc     qualityFactor
0       57333310    3179138607  0       0.0     0.00    1.5
1       57333310    0           0       0.0     0.00    1.5
2       57335920    4001223651  0       0.0     0.00    1.5
3       57335920    896887695   0       0.0     0.00    1.5
4       57335920    0           0       0.0     0.00    1.5

I have tried this code:

# d is JSON dictionary

df = pd.DataFrame()
for key_1 in d.keys():
    for key_2 in d[key_1].keys():
        df = df.append(pd.DataFrame.from_dict(d[key_1][key_2], orient = 'index').T.join(
            pd.DataFrame({'key_1':[key_1], 'key_2':[key_2]})), ignore_index = True)
df

returns

    clicks  spent   cpc     qualityFactor   key_1   key_2
0   0.0     0.0     0.0     1.5             57333310        3179138607
1   0.0     0.0     0.0     1.5             57333310        0
2   0.0     0.0     0.0     1.5             57335920        4001223651
3   0.0     0.0     0.0     1.5             57335920        896887695
4   0.0     0.0     0.0     1.5             57335920        0

and this:

for key_1 in d.keys():
    for key_2 in d[key_1].keys():
        d[key_1][key_2].update({'key_1': key_1})
        d[key_1][key_2].update({'key_2': key_2})

r = []
for l in list([list(x.values()) for x in list(d.values())]):
    r  = l
df = pd.DataFrame(r)[['key_1','key_2','clicks','spent','cpc','qualityFactor']]
df

returns

    key_1       key_2       clicks  spent   cpc qualityFactor
0   57333310    3179138607  0       0       0   1.5
1   57333310    0           0       0       0   1.5
2   57335920    4001223651  0       0       0   1.5
3   57335920    896887695   0       0       0   1.5
4   57335920    0           0       0       0   1.5

Both ways work, but 1-st the code should not have nested loops, 2-nd each level of nesting will have to be processed with a new nested loop, which makes the code not universal for any nesting, because a new cycle needs to be completed additionally with hands.

The task is to get such a data frame without resorting to the use of cycles (at least nested ones). Does anyone have the solution?

CodePudding user response:

Use list comprehension with convert keys to helper dict, last pass to DataFrame constructor is fastest solution, especially if many nested dicts:

L = [{**{'key_1':k1, 'key_2':k2}, **v2} for k1, v1 in d.items() for k2, v2 in v1.items()]
df = pd.DataFrame(L)
print (df)

      key_1       key_2  clicks  spent  cpc  qualityFactor
0  57333310  3179138607       0      0    0            1.5
1  57333310           0       0      0    0            1.5
2  57335920  4001223651       0      0    0            1.5
3  57335920   896887695       0      0    0            1.5
4  57335920           0       0      0    0            1.5
    
  • Related