Home > other >  How do I convert this complex nested Dict into Pandas
How do I convert this complex nested Dict into Pandas

Time:04-07

I want to transform the result of a call from an API into a data frame. The result of the API call is a nested dictionary, but I am unable to produce a data frame.

In addition to json_normalize, I tried pd.DataFrame.from_dict. However, until now had been unsuccessful. I also tried to flatten the dictionary, but nothing.

I used the following call:

response = requests.request("GET", url, headers=headers, data=payload)
result = response.json()

And the output was:

{'mlcSongCode': 'A6457V',
 'primaryTitle': 'AIR FORCE ONES',
 'membersSongId': '',
 'artists': 'TRACK | NELLY, MURPHY LEE, ALI, KYJUAN, TRACK BOYZ',
 'propertyId': None,
 'akas': [{'akaId': '', 'akaTitle': '', 'akaTitleTypeCode': ''}],
 'writers': [{'writerId': '1083561',
   'writerLastName': 'SMITH',
   'writerFirstName': 'PREMRO VONZELLAIRE',
   'writerIPI': '00232478669',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535223',
   'chainParentId': ''},
  {'writerId': '1858916',
   'writerLastName': 'GOODWIN',
   'writerFirstName': 'MARLON',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535224',
   'chainParentId': ''},
  {'writerId': '1883205',
   'writerLastName': 'HAYNES',
   'writerFirstName': 'CORNELL',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535225',
   'chainParentId': ''},
  {'writerId': '4733138',
   'writerLastName': 'LAVELLE',
   'writerFirstName': 'CRUMP',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535226',
   'chainParentId': ''}],
 'publishers': [{'publisherId': '910354',
   'mlcPublisherNumber': None,
   'publisherName': 'TENYOR MUSIC',
   'publisherIpiNumber': '00263286262',
   'publisherRoleCode': 'OriginalPublisher',
   'collectionShare': 16.67,
   'chainId': 'PSA_311720187',
   'chainParentId': 'PSC_311915511',
   'administrators': [],
   'parentPublishers': [{'publisherId': '377508',
     'mlcPublisherNumber': None,
     'publisherName': 'ALL MY PUBLISHING LLC',
     'publisherIpiNumber': '',
     'publisherRoleCode': 'OriginalPublisher',
     'collectionShare': 0,
     'chainId': 'PSC_311915511',
     'chainParentId': 'PSC_337535223|PSC_337535224|PSC_337535225|PSC_337535226',
     'administrators': [],
     'parentPublishers': []}]},
  {'publisherId': '716372',
   'mlcPublisherNumber': None,
   'publisherName': 'KOBALT MUSIC PUB AMERICA INC',
   'publisherIpiNumber': '00503659557',
   'publisherRoleCode': 'SubPublisher',
   'collectionShare': 50,
   'chainId': 'PSA_365023093',
   'chainParentId': 'PSC_337535222',
   'administrators': [],
   'parentPublishers': [{'publisherId': '631204',
     'mlcPublisherNumber': None,
     'publisherName': 'TARPO MUSIC PUB.',
     'publisherIpiNumber': '00419823444',
     'publisherRoleCode': 'OriginalPublisher',
     'collectionShare': 0,
     'chainId': 'PSC_337535222',
     'chainParentId': '',
     'administrators': [],
     'parentPublishers': []}]}],
 'iswc': ''}

Then to produce the data frame, I used the following code:

df = pd.json_normalize(result)
# df = pd.read_json(result)
print(df)

But had an error

AttributeError: module 'pandas' has no attribute 'json_normalize'

My main goal is to convert this into Excel or CSV format so that it is properly readable.

CodePudding user response:

Starting with the results dictionary

result = {'mlcSongCode': 'A6457V',
 'primaryTitle': 'AIR FORCE ONES',
 'membersSongId': '',
 'artists': 'TRACK | NELLY, MURPHY LEE, ALI, KYJUAN, TRACK BOYZ',
 'propertyId': None,
 'akas': [{'akaId': '', 'akaTitle': '', 'akaTitleTypeCode': ''}],
 'writers': [{'writerId': '1083561',
   'writerLastName': 'SMITH',
   'writerFirstName': 'PREMRO VONZELLAIRE',
   'writerIPI': '00232478669',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535223',
   'chainParentId': ''},
  {'writerId': '1858916',
   'writerLastName': 'GOODWIN',
   'writerFirstName': 'MARLON',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535224',
   'chainParentId': ''},
  {'writerId': '1883205',
   'writerLastName': 'HAYNES',
   'writerFirstName': 'CORNELL',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535225',
   'chainParentId': ''},
  {'writerId': '4733138',
   'writerLastName': 'LAVELLE',
   'writerFirstName': 'CRUMP',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535226',
   'chainParentId': ''}],
 'publishers': [{'publisherId': '910354',
   'mlcPublisherNumber': None,
   'publisherName': 'TENYOR MUSIC',
   'publisherIpiNumber': '00263286262',
   'publisherRoleCode': 'OriginalPublisher',
   'collectionShare': 16.67,
   'chainId': 'PSA_311720187',
   'chainParentId': 'PSC_311915511',
   'administrators': [],
   'parentPublishers': [{'publisherId': '377508',
     'mlcPublisherNumber': None,
     'publisherName': 'ALL MY PUBLISHING LLC',
     'publisherIpiNumber': '',
     'publisherRoleCode': 'OriginalPublisher',
     'collectionShare': 0,
     'chainId': 'PSC_311915511',
     'chainParentId': 'PSC_337535223|PSC_337535224|PSC_337535225|PSC_337535226',
     'administrators': [],
     'parentPublishers': []}]},
  {'publisherId': '716372',
   'mlcPublisherNumber': None,
   'publisherName': 'KOBALT MUSIC PUB AMERICA INC',
   'publisherIpiNumber': '00503659557',
   'publisherRoleCode': 'SubPublisher',
   'collectionShare': 50,
   'chainId': 'PSA_365023093',
   'chainParentId': 'PSC_337535222',
   'administrators': [],
   'parentPublishers': [{'publisherId': '631204',
     'mlcPublisherNumber': None,
     'publisherName': 'TARPO MUSIC PUB.',
     'publisherIpiNumber': '00419823444',
     'publisherRoleCode': 'OriginalPublisher',
     'collectionShare': 0,
     'chainId': 'PSC_337535222',
     'chainParentId': '',
     'administrators': [],
     'parentPublishers': []}]}],
 'iswc': ''}

Load it into a dataframe:

import pandas as pd
df = pd.json_normalize(result)

This gives a dataframe with each key of results as a column, and the value of the key as the column value. In this case, the columns are mlcSongCode primaryTitle membersSongId artists propertyId akas writers publishers iswc

Explode the writers column:

df = df.explode('writers').reset_index(drop=True)

This converts each element in the writers array into a row, giving you a dataframe with one row for each 'writer'

Normalize the writers JSON into a flat table. This takes the JSON for each 'writer' and expands each key of it into a column. E.g. it will generate a column for 'writerLastName', 'writerFirstName' etc

normalized = pd.json_normalize(df['writers'])

Join the normalized dataframe to the original dataframe, and remove the original 'writers' column:

df = df.join(normalized).drop(columns=['writers'])

Then repeat with the other JSON columns as needed

  • Related