Home > Back-end >  append value in a nested dictionary to list and convert that list to dataframe
append value in a nested dictionary to list and convert that list to dataframe

Time:09-27

I have a list of nested dictionaries as such:

keywords_data=[{'vol': 90500,
  'cpc': {'currency': '$', 'value': '4.64'},
  'keyword': 'coronary artery disease',
  'competition': 0.15,
  'trend': [{'month': 'September', 'year': 2021, 'value': 90500},
   {'month': 'October', 'year': 2021, 'value': 90500},
   {'month': 'November', 'year': 2021, 'value': 90500},
   {'month': 'December', 'year': 2021, 'value': 74000},
   {'month': 'January', 'year': 2022, 'value': 90500},
   {'month': 'February', 'year': 2022, 'value': 110000},
   {'month': 'March', 'year': 2022, 'value': 110000},
   {'month': 'April', 'year': 2022, 'value': 110000},
   {'month': 'May', 'year': 2022, 'value': 90500},
   {'month': 'June', 'year': 2022, 'value': 90500},
   {'month': 'July', 'year': 2022, 'value': 90500},
   {'month': 'August', 'year': 2022, 'value': 90500}]}]

I want to convert it into a dataframe such as the following


keyword                       month        year        value

coronary artery disease       september    2021         90500
coronary artery disease       october      2021         90500
coronary artery disease       november     2021         90500
.
.
.
.

I am able to access the element keyword and competition and cpc using


vol = []
cpc = []
for element in keywords_data:
    vol.append(element["vol"])
    cpc.append(element["cpc"]["value"])

but when I try to access the month under trend using the same approach it throws an error saying list indices must be slices or strings, not str.

how can I get this into a dataframe as shown above?

CodePudding user response:

Use json_normalize:

df = pd.json_normalize(keywords_data, 'trend', ['vol', 'keyword', 'competition','cpc'])

df = df.join(pd.json_normalize(df.pop('cpc')).add_prefix('cpc.'))
print (df)
        month  year   value    vol                  keyword competition  \
0   September  2021   90500  90500  coronary artery disease        0.15   
1     October  2021   90500  90500  coronary artery disease        0.15   
2    November  2021   90500  90500  coronary artery disease        0.15   
3    December  2021   74000  90500  coronary artery disease        0.15   
4     January  2022   90500  90500  coronary artery disease        0.15   
5    February  2022  110000  90500  coronary artery disease        0.15   
6       March  2022  110000  90500  coronary artery disease        0.15   
7       April  2022  110000  90500  coronary artery disease        0.15   
8         May  2022   90500  90500  coronary artery disease        0.15   
9        June  2022   90500  90500  coronary artery disease        0.15   
10       July  2022   90500  90500  coronary artery disease        0.15   
11     August  2022   90500  90500  coronary artery disease        0.15   

   cpc.currency cpc.value  
0             $      4.64  
1             $      4.64  
2             $      4.64  
3             $      4.64  
4             $      4.64  
5             $      4.64  
6             $      4.64  
7             $      4.64  
8             $      4.64  
9             $      4.64  
10            $      4.64  
11            $      4.64  
  • Related