From an API, I get information about who has filled a particular form and when they have done it as a json object. I get the below data from 2 forms formid = ["61438732", "48247759dc"]
. The json object is stored in the r_json_object
variable.
r_json_online = {
'results': [
{'submittedAt': 1669963478503,
'values': [{'email': '[email protected]'}]},
{'submittedAt': 1669963259737,
'values': [{'email': '[email protected]'}]},
{'submittedAt': 1669963165956,
'values': [{'email': '[email protected]'}]}
]
}
I have used the json_normalize
function to de-nest the json object and insert the values into a DataFrame
called form_submissions
. This is the code I have used
import pandas as pd
from pandas import json_normalize
submissions = []
formid = ["61438732", "48247759dc"]
for i in range(0,len(formid)):
submissions.extend(r_json_online["results"])
form_submissions = pd.DataFrame()
for j in submissions:
form_submissions = form_submissions.append(json_normalize(j["values"]))
form_submissions = form_submissions.append({'createdOn': j["submittedAt"]}, ignore_index=True)
form_submissions = form_submissions.append({'formid': formid[i]}, ignore_index=True)
form_submissions['createdOn'] = form_submissions['createdOn'].fillna(method = 'bfill')
form_submissions['formid'] = form_submissions['formid'].fillna(method = 'bfill')
form_submissions = form_submissions.dropna(subset= 'email')
Code explanation:
I have created an empty list called
submissions
For each value in the formid list, I'm running the
for
loop.In the
for
loop:a. I have added data to the submissions list
b. Created an empty DataFrame, normalized the json object and appended the values to the DataFrame from each element in thesubmissions
list
Expected Output:
- I wanted the first 3 rows to have
formid = '61438732'
- The next 3 rows should have the
formid = '48247759dc'
Actual Output:
- The formid is the same for all the rows
CodePudding user response:
The problem is you are using this line " form_submissions = pd.DataFrame()" in the loop which reset your dataframe each time
CodePudding user response:
This can easily attained by converting into two dataframes and doing cartesian product/cross merge on between both.
formids = ["61438732", "48247759dc"]
form_submissions_df=json_normalize(r_json_online['results'], record_path=['values'], meta=['submittedAt'])
# converting form_ids list to dataframe
form_ids_df = pd.DataFrame (formids, columns = ['form_id'])
# cross merge for cartesian product result
form_submissions_df.merge(form_ids_df, how="cross")