I would like to convert a dict of lists of a dict into a dataframe selectively. I would only like to take the publisher and the title from the results if the publisher name is Benzinga:
{'results': [{'id': 'knNyIzsECbl3YYPAKIQsEoaO4_roXDftV-auy9lSB-w',
'publisher': {'name': 'Benzinga',
'homepage_url': 'https://www.benzinga.com/'},
'title': 'Earnings Scheduled For May 11, 2021'},
{'id': 'KNDx8p0PytFULh33UWse-BkT7XxpxLZtGLij22tiZMM',
'publisher': {'name': 'The Motley Fool',
'homepage_url': 'https://www.fool.com/',
'title': 'Taysha Gene Therapies, Inc. (TSHA) Q1 2021 Earnings Call Transcript'}]}
expected output:
publisher title
Benzinga Earnings Scheduled For May 11, 2021
If I convert to pandas dataframe first then it keeps lists and dicts in the elements of the dataframe...
CodePudding user response:
Normalize the dict using comprehension then create a new dataframe
pd.DataFrame({'publisher': d['publisher']['name'], 'title': d['title']} for d in dct['results'])
Or you can also try json_normalize
:
pd.json_normalize(dct['results'])[['title', 'publisher.name']]
Result
publisher title
0 Benzinga Earnings Scheduled For May 11, 2021
1 The Motley Fool Taysha Gene Therapies, Inc. (TSHA) Q1 2021 Earnings Call Transcript
CodePudding user response:
Is the initial dict supposed to be the following?
data = {'results':
[{'id': 'knNyIzsECbl3YYPAKIQsEoaO4_roXDftV-auy9lSB-w',
'publisher': {'name': 'Benzinga',
'homepage_url': 'https://www.benzinga.com/'},
'title': 'Earnings Scheduled For May 11, 2021'},
{'id': 'KNDx8p0PytFULh33UWse-BkT7XxpxLZtGLij22tiZMM',
'publisher': {'name': 'The Motley Fool',
'homepage_url': 'https://www.fool.com/'},
'title': 'Taysha Gene Therapies, Inc. (TSHA) Q1 2021 Earnings Call Transcript'}]}
If so, then you could create an empty dict of lists and append the selected results (which you could then convert to a DataFrame):
a_dict = {'publisher': [], 'title': []}
for i in data['results']:
if i['publisher']['name'] == 'Benzinga':
a_dict['publisher'].append(i['publisher']['name'])
a_dict['title'].append(i['title'])
a_dict
{'publisher': ['Benzinga'], 'title': ['Earnings Scheduled For May 11, 2021']}