Home > Net >  Convert multi nested Json to Pandas DataFrame
Convert multi nested Json to Pandas DataFrame

Time:02-14

{
"ABC": {
"A": {
"Date": "01/01/2021",
"Value": "0.09"
},
"B": {
"Date": "01/01/2021",
"Value": "0.001"
}
},
"XYZ": {
"A": {
"Date": "01/01/2021",
"Value": "0.006"
},
"B": {
"Date": "01/01/2021",
"Value": "0.000"
}
}
}

Current output after applying pd.json_normalize(x,max_level=1)

enter image description here

Expected Output : Need to Convert this to pandas DataFrame

enter image description here

If any one can help or give some advice on working with this data that would be great!

CodePudding user response:

Use the following while the js is your input dict:

s = pd.DataFrame(js)
ss = s.apply(lambda x: [pd.Series(y)['Value'] for y in x])
ss['Date'] = s['ABC'].apply(pd.Series)['Date']

result:

enter image description here

CodePudding user response:

One of possible options is custom processing of your x object, creating a list of rows:

lst = []
for k1, v1 in x.items():
    row = {}
    row['key'] = k1
    for k2, v2 in v1.items():
        dd = v2["Date"]
        vv = float(v2["Value"])
        row['Date'] = dd
        row[k2] = vv
    lst.append(row)

Note that the above code also converts Value to float type.

I assumed that all dates in each first-level object are the same, so in the second level loop Date is overwritten, but I assume that this does no harm.

Then you can create the output DataFrame as follows:

df = pd.DataFrame(lst)
df.set_index('key', inplace=True)
df.index.name = None

The result is:

           Date      A      B
ABC  01/01/2021  0.090  0.001
XYZ  01/01/2021  0.006  0.000

Although it is possible to read x using json_normalize into a temporary DataFrame, the sequence of operations to convert it to your desired shape would be complicated.

This is why I came up with the above, in my opinion conceptually simpler solution.

  • Related