I have a list of nested dictionaries as such:
keywords_data=[{'vol': 90500,
'cpc': {'currency': '$', 'value': '4.64'},
'keyword': 'coronary artery disease',
'competition': 0.15,
'trend': [{'month': 'September', 'year': 2021, 'value': 90500},
{'month': 'October', 'year': 2021, 'value': 90500},
{'month': 'November', 'year': 2021, 'value': 90500},
{'month': 'December', 'year': 2021, 'value': 74000},
{'month': 'January', 'year': 2022, 'value': 90500},
{'month': 'February', 'year': 2022, 'value': 110000},
{'month': 'March', 'year': 2022, 'value': 110000},
{'month': 'April', 'year': 2022, 'value': 110000},
{'month': 'May', 'year': 2022, 'value': 90500},
{'month': 'June', 'year': 2022, 'value': 90500},
{'month': 'July', 'year': 2022, 'value': 90500},
{'month': 'August', 'year': 2022, 'value': 90500}]}]
I want to convert it into a dataframe such as the following
keyword month year value
coronary artery disease september 2021 90500
coronary artery disease october 2021 90500
coronary artery disease november 2021 90500
.
.
.
.
I am able to access the element keyword and competition and cpc using
vol = []
cpc = []
for element in keywords_data:
vol.append(element["vol"])
cpc.append(element["cpc"]["value"])
but when I try to access the month under trend using the same approach it throws an error saying list indices must be slices or strings, not str.
how can I get this into a dataframe as shown above?
CodePudding user response:
Use json_normalize
:
df = pd.json_normalize(keywords_data, 'trend', ['vol', 'keyword', 'competition','cpc'])
df = df.join(pd.json_normalize(df.pop('cpc')).add_prefix('cpc.'))
print (df)
month year value vol keyword competition \
0 September 2021 90500 90500 coronary artery disease 0.15
1 October 2021 90500 90500 coronary artery disease 0.15
2 November 2021 90500 90500 coronary artery disease 0.15
3 December 2021 74000 90500 coronary artery disease 0.15
4 January 2022 90500 90500 coronary artery disease 0.15
5 February 2022 110000 90500 coronary artery disease 0.15
6 March 2022 110000 90500 coronary artery disease 0.15
7 April 2022 110000 90500 coronary artery disease 0.15
8 May 2022 90500 90500 coronary artery disease 0.15
9 June 2022 90500 90500 coronary artery disease 0.15
10 July 2022 90500 90500 coronary artery disease 0.15
11 August 2022 90500 90500 coronary artery disease 0.15
cpc.currency cpc.value
0 $ 4.64
1 $ 4.64
2 $ 4.64
3 $ 4.64
4 $ 4.64
5 $ 4.64
6 $ 4.64
7 $ 4.64
8 $ 4.64
9 $ 4.64
10 $ 4.64
11 $ 4.64