I have a list with some empty values and some w/ nested JSON. The data looks like this:
[[],
[],
[{'id': 32,
'globalId': 'a73dec29-9431-4806-a4f7-0667872746ce',
'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
'name': 'IMG_9774.jpeg',
'contentType': 'image/jpeg',
'size': 157893,
'keywords': '',
'exifInfo': None},
{'id': 33,
'globalId': '0455db91-946e-4fae-8aab-0a4729219527',
'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
'name': 'IMG_9766.jpeg',
'contentType': 'image/jpeg',
'size': 160480,
'keywords': '',
'exifInfo': None},
{'id': 34,
'globalId': '4c036305-a1c5-4689-8640-1dc79aaf0358',
'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
'name': 'IMG_3870.jpeg',
'contentType': 'image/jpeg',
'size': 757939,
'keywords': '',
'exifInfo': None},
{'id': 35,
'globalId': '1868ac95-1830-45fb-8f15-975ef0e14338',
'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
'name': 'IMG_2357.jpeg',
'contentType': 'image/jpeg',
'size': 4500893,
'keywords': '',
'exifInfo': None}],
[]]
Using a simple json_normalize ()
test = pd.json_normalize(attach)
test
I get the following result:
0 1 2 3 4
0 None None None None None
1 None None None None None
2 None None None None None
3 None None None None None
4 None None None None None
... ... ... ... ... ...
83 None None None None None
84 None None None None None
85 None None None None None
86 {'id': 32, 'globalId': 'a73dec29-9431-4806-a4f... {'id': 33, 'globalId': '0455db91-946e-4fae-8aa... {'id': 34, 'globalId': '4c036305-a1c5-4689-864... {'id': 35, 'globalId': '1868ac95-1830-45fb-8f1... None
87 None None None None None
I would ideally have a dataframe w/ each key in the JSON/object as a column name, something like:
id globalId parentGlobalId name contentType size keywords exifInfo
None None None None None None None None
None None None None None None None None
32 a73dec29-9431-4806-a4f7-0667872746ce ad21cef5-cfa7-4e52-ab8f-8b5da30020af IMG_9774.jpeg image/jpeg 157893 None None
33 0455db91-946e-4fae-8aab-0a4729219527 ad21cef5-cfa7-4e52-ab8f-8b5da30020af IMG_9766.jpeg image/jpeg 160480 None None
34 4c036305-a1c5-4689-8640-1dc79aaf0358 ad21cef5-cfa7-4e52-ab8f-8b5da30020af IMG_3870.jpeg image/jpeg 757939 None None
None None None None None None None None
I've experimented a bunch with the parameters in the json_normalize() method with no luck.
CodePudding user response:
If lst
is your list from the question you can do:
df = pd.DataFrame([d for l in lst for d in (l or [{}])])
print(df)
Prints:
id globalId parentGlobalId name contentType size keywords exifInfo
0 NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN
2 32.0 a73dec29-9431-4806-a4f7-0667872746ce ad21cef5-cfa7-4e52-ab8f-8b5da30020af IMG_9774.jpeg image/jpeg 157893.0 NaN
3 33.0 0455db91-946e-4fae-8aab-0a4729219527 ad21cef5-cfa7-4e52-ab8f-8b5da30020af IMG_9766.jpeg image/jpeg 160480.0 NaN
4 34.0 4c036305-a1c5-4689-8640-1dc79aaf0358 ad21cef5-cfa7-4e52-ab8f-8b5da30020af IMG_3870.jpeg image/jpeg 757939.0 NaN
5 35.0 1868ac95-1830-45fb-8f15-975ef0e14338 ad21cef5-cfa7-4e52-ab8f-8b5da30020af IMG_2357.jpeg image/jpeg 4500893.0 NaN
6 NaN NaN NaN NaN NaN NaN NaN NaN