I want to transform the result of a call from an API into a data frame. The result of the API call is a nested dictionary, but I am unable to produce a data frame.
In addition to json_normalize, I tried pd.DataFrame.from_dict. However, until now had been unsuccessful. I also tried to flatten the dictionary, but nothing.
I used the following call:
response = requests.request("GET", url, headers=headers, data=payload)
result = response.json()
And the output was:
{'mlcSongCode': 'A6457V',
'primaryTitle': 'AIR FORCE ONES',
'membersSongId': '',
'artists': 'TRACK | NELLY, MURPHY LEE, ALI, KYJUAN, TRACK BOYZ',
'propertyId': None,
'akas': [{'akaId': '', 'akaTitle': '', 'akaTitleTypeCode': ''}],
'writers': [{'writerId': '1083561',
'writerLastName': 'SMITH',
'writerFirstName': 'PREMRO VONZELLAIRE',
'writerIPI': '00232478669',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535223',
'chainParentId': ''},
{'writerId': '1858916',
'writerLastName': 'GOODWIN',
'writerFirstName': 'MARLON',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535224',
'chainParentId': ''},
{'writerId': '1883205',
'writerLastName': 'HAYNES',
'writerFirstName': 'CORNELL',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535225',
'chainParentId': ''},
{'writerId': '4733138',
'writerLastName': 'LAVELLE',
'writerFirstName': 'CRUMP',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535226',
'chainParentId': ''}],
'publishers': [{'publisherId': '910354',
'mlcPublisherNumber': None,
'publisherName': 'TENYOR MUSIC',
'publisherIpiNumber': '00263286262',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 16.67,
'chainId': 'PSA_311720187',
'chainParentId': 'PSC_311915511',
'administrators': [],
'parentPublishers': [{'publisherId': '377508',
'mlcPublisherNumber': None,
'publisherName': 'ALL MY PUBLISHING LLC',
'publisherIpiNumber': '',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_311915511',
'chainParentId': 'PSC_337535223|PSC_337535224|PSC_337535225|PSC_337535226',
'administrators': [],
'parentPublishers': []}]},
{'publisherId': '716372',
'mlcPublisherNumber': None,
'publisherName': 'KOBALT MUSIC PUB AMERICA INC',
'publisherIpiNumber': '00503659557',
'publisherRoleCode': 'SubPublisher',
'collectionShare': 50,
'chainId': 'PSA_365023093',
'chainParentId': 'PSC_337535222',
'administrators': [],
'parentPublishers': [{'publisherId': '631204',
'mlcPublisherNumber': None,
'publisherName': 'TARPO MUSIC PUB.',
'publisherIpiNumber': '00419823444',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_337535222',
'chainParentId': '',
'administrators': [],
'parentPublishers': []}]}],
'iswc': ''}
Then to produce the data frame, I used the following code:
df = pd.json_normalize(result)
# df = pd.read_json(result)
print(df)
But had an error
AttributeError: module 'pandas' has no attribute 'json_normalize'
My main goal is to convert this into Excel or CSV format so that it is properly readable.
CodePudding user response:
Starting with the results dictionary
result = {'mlcSongCode': 'A6457V',
'primaryTitle': 'AIR FORCE ONES',
'membersSongId': '',
'artists': 'TRACK | NELLY, MURPHY LEE, ALI, KYJUAN, TRACK BOYZ',
'propertyId': None,
'akas': [{'akaId': '', 'akaTitle': '', 'akaTitleTypeCode': ''}],
'writers': [{'writerId': '1083561',
'writerLastName': 'SMITH',
'writerFirstName': 'PREMRO VONZELLAIRE',
'writerIPI': '00232478669',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535223',
'chainParentId': ''},
{'writerId': '1858916',
'writerLastName': 'GOODWIN',
'writerFirstName': 'MARLON',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535224',
'chainParentId': ''},
{'writerId': '1883205',
'writerLastName': 'HAYNES',
'writerFirstName': 'CORNELL',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535225',
'chainParentId': ''},
{'writerId': '4733138',
'writerLastName': 'LAVELLE',
'writerFirstName': 'CRUMP',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535226',
'chainParentId': ''}],
'publishers': [{'publisherId': '910354',
'mlcPublisherNumber': None,
'publisherName': 'TENYOR MUSIC',
'publisherIpiNumber': '00263286262',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 16.67,
'chainId': 'PSA_311720187',
'chainParentId': 'PSC_311915511',
'administrators': [],
'parentPublishers': [{'publisherId': '377508',
'mlcPublisherNumber': None,
'publisherName': 'ALL MY PUBLISHING LLC',
'publisherIpiNumber': '',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_311915511',
'chainParentId': 'PSC_337535223|PSC_337535224|PSC_337535225|PSC_337535226',
'administrators': [],
'parentPublishers': []}]},
{'publisherId': '716372',
'mlcPublisherNumber': None,
'publisherName': 'KOBALT MUSIC PUB AMERICA INC',
'publisherIpiNumber': '00503659557',
'publisherRoleCode': 'SubPublisher',
'collectionShare': 50,
'chainId': 'PSA_365023093',
'chainParentId': 'PSC_337535222',
'administrators': [],
'parentPublishers': [{'publisherId': '631204',
'mlcPublisherNumber': None,
'publisherName': 'TARPO MUSIC PUB.',
'publisherIpiNumber': '00419823444',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_337535222',
'chainParentId': '',
'administrators': [],
'parentPublishers': []}]}],
'iswc': ''}
Load it into a dataframe:
import pandas as pd
df = pd.json_normalize(result)
This gives a dataframe with each key of results as a column, and the value of the key as the column value. In this case, the columns are mlcSongCode primaryTitle membersSongId artists propertyId akas writers publishers iswc
Explode the writers
column:
df = df.explode('writers').reset_index(drop=True)
This converts each element in the writers
array into a row, giving you a dataframe with one row for each 'writer'
Normalize the writers
JSON into a flat table. This takes the JSON for each 'writer' and expands each key of it into a column. E.g. it will generate a column for 'writerLastName', 'writerFirstName' etc
normalized = pd.json_normalize(df['writers'])
Join the normalized dataframe to the original dataframe, and remove the original 'writers' column:
df = df.join(normalized).drop(columns=['writers'])
Then repeat with the other JSON columns as needed