I have the following json file -
{
"quiz": {
"sport": { "q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
},
"maths": {
"q1": {
"question": "5 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12",
"test_dict":{"a":1,"b":2,"dddd":{"1":1,"2":2}}
},
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
}
}
},
"summary": "good example",
"viewer rating": 6
}
I want to convert it to a DataFrame. Something like this -
quiz q1 q2 question options answer test_dict summary viewer rating
sport q1 NaN Which one.. [list] Huston.. NaN good example 6
maths q1 NaN 5 7 = ? [list] 12 {"a":1.. good example 6
maths NaN q2 12 - 8 = ? [list] 4 NaN good example 6
I tried doing it using
file1 = open("json2.json")
data = json.load(file1)
df = pd.json_normalize(data, record_path=['quiz'])
But I am getting the following error -
TypeError: {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6} has non list value {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}} for path quiz. Must be list or null.
The issue is, it is not a list but a dictionary itself. So, I also tried doing this -
pd.json_normalize(data, max_level=2)
But, I am not getting the expected output. I am just getting one row. Can anybody give me some pointers?
CodePudding user response:
You can use a list comprehension:
import pandas as pd
d = {'quiz': {'sport': {'q1': {'question': 'Which one is correct team name in NBA?', 'options': ['New York Bulls', 'Los Angeles Kings', 'Golden State Warriros', 'Huston Rocket'], 'answer': 'Huston Rocket'}}, 'maths': {'q1': {'question': '5 7 = ?', 'options': ['10', '11', '12', '13'], 'answer': '12', 'test_dict': {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}}}, 'q2': {'question': '12 - 8 = ?', 'options': ['1', '2', '3', '4'], 'answer': '4'}}}, 'summary': 'good example', 'viewer rating': 6}
r = [{'quiz':a, q:q, **v, 'summary':d['summary'], 'viewer rating':d['viewer rating']}
for a, b in d['quiz'].items() for q, v in b.items()]
df = pd.DataFrame(r)
Output:
quiz q1 question ... viewer rating test_dict q2
0 sport q1 Which one is correct team name in NBA? ... 6 NaN NaN
1 maths q1 5 7 = ? ... 6 {'a': 1, 'b': 2, 'dddd': {'1': 1, '2': 2}} NaN
2 maths NaN 12 - 8 = ? ... 6 NaN q2