Home > Net >  Creating a pandas DataFrame from json object and appending a column to it
Creating a pandas DataFrame from json object and appending a column to it

Time:01-04

From an API, I get information about who has filled a particular form and when they have done it as a json object. I get the below data from 2 forms formid = ["61438732", "48247759dc"]. The json object is stored in the r_json_object variable.

r_json_online = {
'results': [
{'submittedAt': 1669963478503,
   'values': [{'email': '[email protected]'}]},
{'submittedAt': 1669963259737,
   'values': [{'email': '[email protected]'}]},
{'submittedAt': 1669963165956,
   'values': [{'email': '[email protected]'}]}
]
}

I have used the json_normalize function to de-nest the json object and insert the values into a DataFrame called form_submissions. This is the code I have used

import pandas as pd
from pandas import json_normalize
    
submissions = []

formid = ["61438732", "48247759dc"]

for i in range(0,len(formid)):
    
    submissions.extend(r_json_online["results"])

    form_submissions = pd.DataFrame()

    for j in submissions:
        form_submissions = form_submissions.append(json_normalize(j["values"]))
        form_submissions = form_submissions.append({'createdOn': j["submittedAt"]}, ignore_index=True)
        form_submissions = form_submissions.append({'formid': formid[i]}, ignore_index=True)
                 
form_submissions['createdOn'] = form_submissions['createdOn'].fillna(method = 'bfill')
form_submissions['formid'] = form_submissions['formid'].fillna(method = 'bfill')
form_submissions = form_submissions.dropna(subset= 'email')

Code explanation:

  1. I have created an empty list called submissions

  2. For each value in the formid list, I'm running the for loop.

  3. In the for loop:

    a. I have added data to the submissions list
    b. Created an empty DataFrame, normalized the json object and appended the values to the DataFrame from each element in the submissions list

Expected Output:

  1. I wanted the first 3 rows to have formid = '61438732'
  2. The next 3 rows should have the formid = '48247759dc'

enter image description here

Actual Output:

  1. The formid is the same for all the rows

enter image description here

CodePudding user response:

The problem is you are using this line " form_submissions = pd.DataFrame()" in the loop which reset your dataframe each time

CodePudding user response:

This can easily attained by converting into two dataframes and doing cartesian product/cross merge on between both.

formids = ["61438732", "48247759dc"]
form_submissions_df=json_normalize(r_json_online['results'], record_path=['values'], meta=['submittedAt'])

# converting form_ids list to dataframe
form_ids_df = pd.DataFrame (formids, columns = ['form_id'])

# cross merge for cartesian product result 
form_submissions_df.merge(form_ids_df, how="cross")

Results

  • Related