Home > front end >  extract key and values from nested jsons and put in a DataFrame
extract key and values from nested jsons and put in a DataFrame

Time:06-07

I have a json object that looks like this:

{
  "2022-06-05":{
    "revenue":"287.05",
    "returns_amount":"0.00",
    "date":"2022-06-05"
  },
  "2022-06-06":{
    "revenue":"229.76",
    "returns_amount":"0.00",
    "date":"2022-06-06"
  },
  "2022-06-07":{
    "revenue":"0.00",
    "returns_amount":"0.00",
    "date":"2022-06-07"
  }
}

Is it possible to extract all dates (key) along with their respective "revenue" values and put them into a dataframe that looks likethis:

date              revenue
2022-06-05        287.05
2022-06-06        229.76
2022-06-07        0.00

CodePudding user response:

You could use read_json to read the json string and then remove the index date values and returns_amount column. Assuming the json is in jstr:

pd.read_json(jstr, orient='index').reset_index(drop=True).drop('returns_amount',axis=1)

Output:

   revenue       date
0   287.05 2022-06-05
1   229.76 2022-06-06
2     0.00 2022-06-07

Alternatively, convert the json to a dictionary and then filter it to make the dataframe:

dic = json.loads(jstr)
pd.DataFrame([ { 'date' : d['date'], 'revenue': d['revenue'] } for d in dic.values() ])

Output:

         date revenue
0  2022-06-05  287.05
1  2022-06-06  229.76
2  2022-06-07    0.00

CodePudding user response:

How about something like:

import pandas as pd
import json

data = []   # list to hold data
file_name = "file.json" # JSON file

with open(file_name) as f:
    doc = json.load(f)
    for key in doc:
        data.append(doc[key])   # just need object, not outer date "key"

    df = pd.DataFrame(data=data, columns=["date", "revenue"])
    print(df)
  • Related