Home > Enterprise >  JSON in List to Pandas Data Frame
JSON in List to Pandas Data Frame

Time:11-01

I have a list with some empty values and some w/ nested JSON. The data looks like this:

 [[],
 [],
 [{'id': 32,
   'globalId': 'a73dec29-9431-4806-a4f7-0667872746ce',
   'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
   'name': 'IMG_9774.jpeg',
   'contentType': 'image/jpeg',
   'size': 157893,
   'keywords': '',
   'exifInfo': None},
  {'id': 33,
   'globalId': '0455db91-946e-4fae-8aab-0a4729219527',
   'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
   'name': 'IMG_9766.jpeg',
   'contentType': 'image/jpeg',
   'size': 160480,
   'keywords': '',
   'exifInfo': None},
  {'id': 34,
   'globalId': '4c036305-a1c5-4689-8640-1dc79aaf0358',
   'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
   'name': 'IMG_3870.jpeg',
   'contentType': 'image/jpeg',
   'size': 757939,
   'keywords': '',
   'exifInfo': None},
  {'id': 35,
   'globalId': '1868ac95-1830-45fb-8f15-975ef0e14338',
   'parentGlobalId': 'ad21cef5-cfa7-4e52-ab8f-8b5da30020af',
   'name': 'IMG_2357.jpeg',
   'contentType': 'image/jpeg',
   'size': 4500893,
   'keywords': '',
   'exifInfo': None}],
 []]

Using a simple json_normalize ()

test = pd.json_normalize(attach)
test

I get the following result:

    0   1   2   3   4
0   None    None    None    None    None
1   None    None    None    None    None
2   None    None    None    None    None
3   None    None    None    None    None
4   None    None    None    None    None
... ... ... ... ... ...
83  None    None    None    None    None
84  None    None    None    None    None
85  None    None    None    None    None
86  {'id': 32, 'globalId': 'a73dec29-9431-4806-a4f...   {'id': 33, 'globalId': '0455db91-946e-4fae-8aa...   {'id': 34, 'globalId': '4c036305-a1c5-4689-864...   {'id': 35, 'globalId': '1868ac95-1830-45fb-8f1...   None
87  None    None    None    None    None

I would ideally have a dataframe w/ each key in the JSON/object as a column name, something like:

id  globalId                parentGlobalId              name        contentType size    keywords    exifInfo
None    None                    None                    None        None        None    None        None
None    None                    None                    None        None        None    None        None
32  a73dec29-9431-4806-a4f7-0667872746ce    ad21cef5-cfa7-4e52-ab8f-8b5da30020af    IMG_9774.jpeg   image/jpeg  157893  None        None
33  0455db91-946e-4fae-8aab-0a4729219527    ad21cef5-cfa7-4e52-ab8f-8b5da30020af    IMG_9766.jpeg   image/jpeg  160480  None        None
34  4c036305-a1c5-4689-8640-1dc79aaf0358    ad21cef5-cfa7-4e52-ab8f-8b5da30020af    IMG_3870.jpeg   image/jpeg  757939  None        None
None    None                    None                    None        None        None    None        None

I've experimented a bunch with the parameters in the json_normalize() method with no luck.

CodePudding user response:

If lst is your list from the question you can do:

df = pd.DataFrame([d for l in lst for d in (l or [{}])])
print(df)

Prints:

     id                              globalId                        parentGlobalId           name contentType       size keywords  exifInfo
0   NaN                                   NaN                                   NaN            NaN         NaN        NaN      NaN       NaN
1   NaN                                   NaN                                   NaN            NaN         NaN        NaN      NaN       NaN
2  32.0  a73dec29-9431-4806-a4f7-0667872746ce  ad21cef5-cfa7-4e52-ab8f-8b5da30020af  IMG_9774.jpeg  image/jpeg   157893.0                NaN
3  33.0  0455db91-946e-4fae-8aab-0a4729219527  ad21cef5-cfa7-4e52-ab8f-8b5da30020af  IMG_9766.jpeg  image/jpeg   160480.0                NaN
4  34.0  4c036305-a1c5-4689-8640-1dc79aaf0358  ad21cef5-cfa7-4e52-ab8f-8b5da30020af  IMG_3870.jpeg  image/jpeg   757939.0                NaN
5  35.0  1868ac95-1830-45fb-8f15-975ef0e14338  ad21cef5-cfa7-4e52-ab8f-8b5da30020af  IMG_2357.jpeg  image/jpeg  4500893.0                NaN
6   NaN                                   NaN                                   NaN            NaN         NaN        NaN      NaN       NaN
  • Related