Home > Blockchain >  Complex python dictonary into pandas dataframe
Complex python dictonary into pandas dataframe

Time:12-28

How to turn this dictionary Similar this using pandas

{'totalMatchedRows': '7', 
'headers': [ {'name': 'DATE', 'type': 'DIMENSION'},  {'name': 'PAGE_VIEWS', 'type': 'METRIC_TALLY'}],
'rows': [{'cells': [{'value': '2022-12-21'},  {'value': '57'}]},
         {'cells': [{'value': '2022-12-22'}, {'value': '29'}]}, 
         {'cells': [{'value': '2022-12-23'}, {'value': '74'}]}, 
         {'cells': [{'value': '2022-12-24'}, {'value': '12'}]}, 
         {'cells': [{'value': '2022-12-25'}, {'value': '24'}]}, 
         {'cells': [{'value': '2022-12-26'}, {'value': '13'}]}, 
         {'cells': [{'value': '2022-12-27'}, {'value': '13'}]}], 
'totals': {'cells': [{}, {'value': '222'}]}, 
'averages': {'cells': [{}, {'value': '31'}]}, 
'warnings': ['This report includes data for dates when your account had a different time zone to your current time zone.'], 
'startDate': {'year': 2022, 'month': 12, 'day': 21}, 
'endDate': {'year': 2022, 'month': 12, 'day': 27}}

and If More Metrics are found they should also be added as another column in DF

DATE PAGE_VIEWS
2022-12-21 57
2022-12-22 29
2022-12-23 74
2022-12-24 12
2022-12-25 24
2022-12-26 13
2022-12-27 13

CodePudding user response:

Manipulate the data using Python before passing to pd.DataFrame to get it into a format accepted by pd.DataFrame, such as a list of tuples:

import pandas as pd

my_dict = {'totalMatchedRows': '7', 
'headers': [ {'name': 'DATE', 'type': 'DIMENSION'},  {'name': 'PAGE_VIEWS', 'type': 'METRIC_TALLY'}],
'rows': [{'cells': [{'value': '2022-12-21'},  {'value': '57'}]},
        {'cells': [{'value': '2022-12-22'}, {'value': '29'}]}, 
        {'cells': [{'value': '2022-12-23'}, {'value': '74'}]}, 
        {'cells': [{'value': '2022-12-24'}, {'value': '12'}]}, 
        {'cells': [{'value': '2022-12-25'}, {'value': '24'}]}, 
        {'cells': [{'value': '2022-12-26'}, {'value': '13'}]}, 
        {'cells': [{'value': '2022-12-27'}, {'value': '13'}]}], 
'totals': {'cells': [{}, {'value': '222'}]}, 
'averages': {'cells': [{}, {'value': '31'}]}, 
'warnings': ['This report includes data for dates when your account had a different time zone to your current time zone.'], 
'startDate': {'year': 2022, 'month': 12, 'day': 21}, 
'endDate': {'year': 2022, 'month': 12, 'day': 27}}

col_names = [header['name'] for header in my_dict['headers']]
row_tuples = [tuple(cell['value'] for cell in row['cells']) for row in my_dict['rows']]
my_df = pd.DataFrame(row_tuples, columns=col_names)
print(my_df)

Outputs:

        DATE PAGE_VIEWS
0  2022-12-21         57
1  2022-12-22         29
2  2022-12-23         74
3  2022-12-24         12
4  2022-12-25         24
5  2022-12-26         13
6  2022-12-27         13

CodePudding user response:

create empty dataframe with header names as columns then add cleaned rows

df = pd.DataFrame(columns=list(map(lambda x:x['name'],data['headers'])))
for i in range(int(data['totalMatchedRows'])):
    df.loc[i]=list(map(lambda x:x['value'],data['rows'][i]['cells']))
  • Related