Home > Net >  Convert nested zeep or JSON to a pandas Dataframe
Convert nested zeep or JSON to a pandas Dataframe

Time:04-26

I am retrieving data via an API which returns a zeep.objects that looks like this:


   {
    'totalNumEntries': 2,
    'entries': [
        {
            'data': [
                {
                    'key': 'KEYWORD_TEXT',
                    'value': {
                        'Attribute.Type': 'StringAttribute',
                        'value': 'zambia'
                    }
                },
                {
                    'key': 'TARGETED_MONTHLY_SEARCHES',
                    'value': {
                        'Attribute.Type': 'MonthlySearchVolumeAttribute',
                        'value': [
                            {
                                'year': 2022,
                                'month': 3,
                                'count': 6600
                            },
                            {
                                'year': 2022,
                                'month': 2,
                                'count': 6600
                            },
                            {
                                'year': 2022,
                                'month': 1,
                                'count': 6600
                            },
                            {
                                'year': 2021,
                                'month': 12,
                                'count': 5400
                            },
                            {
                                'year': 2021,
                                'month': 11,
                                'count': 5400
                            },
                            {
                                'year': 2021,
                                'month': 10,
                                'count': 4400
                            },
                            {
                                'year': 2021,
                                'month': 9,
                                'count': 4400
                            },
                            {
                                'year': 2021,
                                'month': 8,
                                'count': 6600
                            },
                            {
                                'year': 2021,
                                'month': 7,
                                'count': 5400
                            },
                            {
                                'year': 2021,
                                'month': 6,
                                'count': 5400
                            },
                            {
                                'year': 2021,
                                'month': 5,
                                'count': 5400
                            },
                            {
                                'year': 2021,
                                'month': 4,
                                'count': 5400
                            }
                        ]
                    }
                },
                {
                    'key': 'SEARCH_VOLUME',
                    'value': {
                        'Attribute.Type': 'LongAttribute',
                        'value': 5400
                    }
                }
            ]
        },
        {
            'data': [
                {
                    'key': 'KEYWORD_TEXT',
                    'value': {
                        'Attribute.Type': 'StringAttribute',
                        'value': 'malawi'
                    }
                },
                {
                    'key': 'TARGETED_MONTHLY_SEARCHES',
                    'value': {
                        'Attribute.Type': 'MonthlySearchVolumeAttribute',
                        'value': [
                            {
                                'year': 2022,
                                'month': 3,
                                'count': 22200
                            },
                            {
                                'year': 2022,
                                'month': 2,
                                'count': 18100
                            },
                            {
                                'year': 2022,
                                'month': 1,
                                'count': 33100
                            },
                            {
                                'year': 2021,
                                'month': 12,
                                'count': 14800
                            },
                            {
                                'year': 2021,
                                'month': 11,
                                'count': 18100
                            },
                            {
                                'year': 2021,
                                'month': 10,
                                'count': 14800
                            },
                            {
                                'year': 2021,
                                'month': 9,
                                'count': 12100
                            },
                            {
                                'year': 2021,
                                'month': 8,
                                'count': 12100
                            },
                            {
                                'year': 2021,
                                'month': 7,
                                'count': 14800
                            },
                            {
                                'year': 2021,
                                'month': 6,
                                'count': 14800
                            },
                            {
                                'year': 2021,
                                'month': 5,
                                'count': 18100
                            },
                            {
                                'year': 2021,
                                'month': 4,
                                'count': 14800
                            }
                        ]
                    }
                },
                {
                    'key': 'SEARCH_VOLUME',
                    'value': {
                        'Attribute.Type': 'LongAttribute',
                        'value': 18100
                    }
                }
            ]
        }
    ]
}

The end result should be a dataframe that looks like this:

KEYWORD_TEXT SEARCH_VOLUME 3.2022 2.2022 1.2022 12.2021 ...
zambia 5400 6600 6600 6600 5400 ...
malawi 18100 22200 18100 33100 14800 ...

What I have tried so far but failed is this:

#  Convert zeep object to a json: 
from zeep import helpers
_json = helpers.serialize_object(zeep_json, dict)
_json



{'totalNumEntries': 2,
 'entries': [{'data': [{'key': 'KEYWORD_TEXT',
     'value': {'Attribute.Type': 'StringAttribute', 'value': 'zambia'}},
    {'key': 'TARGETED_MONTHLY_SEARCHES',
     'value': {'Attribute.Type': 'MonthlySearchVolumeAttribute',
      'value': [{'year': 2022, 'month': 3, 'count': 6600},
       {'year': 2022, 'month': 2, 'count': 6600},
       {'year': 2022, 'month': 1, 'count': 6600},
       {'year': 2021, 'month': 12, 'count': 5400},
       {'year': 2021, 'month': 11, 'count': 5400},
       {'year': 2021, 'month': 10, 'count': 4400},
       {'year': 2021, 'month': 9, 'count': 4400},
       {'year': 2021, 'month': 8, 'count': 6600},
       {'year': 2021, 'month': 7, 'count': 5400},
       {'year': 2021, 'month': 6, 'count': 5400},
       {'year': 2021, 'month': 5, 'count': 5400},
       {'year': 2021, 'month': 4, 'count': 5400}]}},
    {'key': 'SEARCH_VOLUME',
     'value': {'Attribute.Type': 'LongAttribute', 'value': 5400}}]},
  {'data': [{'key': 'KEYWORD_TEXT',
     'value': {'Attribute.Type': 'StringAttribute', 'value': 'malawi'}},
    {'key': 'TARGETED_MONTHLY_SEARCHES',
     'value': {'Attribute.Type': 'MonthlySearchVolumeAttribute',
      'value': [{'year': 2022, 'month': 3, 'count': 22200},
       {'year': 2022, 'month': 2, 'count': 18100},
       {'year': 2022, 'month': 1, 'count': 33100},
       {'year': 2021, 'month': 12, 'count': 14800},
       {'year': 2021, 'month': 11, 'count': 18100},
       {'year': 2021, 'month': 10, 'count': 14800},
       {'year': 2021, 'month': 9, 'count': 12100},
       {'year': 2021, 'month': 8, 'count': 12100},
       {'year': 2021, 'month': 7, 'count': 14800},
       {'year': 2021, 'month': 6, 'count': 14800},
       {'year': 2021, 'month': 5, 'count': 18100},
       {'year': 2021, 'month': 4, 'count': 14800}]}},
    {'key': 'SEARCH_VOLUME',
     'value': {'Attribute.Type': 'LongAttribute', 'value': 18100}}]}]}

# and then try to load that into a dataframe: 

df = pd.json_normalize(_json['entries'],
   record_path=['data'],
   errors='ignore’)
df.head()

which results in this:

_key _value.Attribute.Type _value.value
KEYWORD_TEXT StringAttribute zambia
TARGETED_MONTHLY_SEARCHES MonthlySearchVolumeAttribute [{'year': 2022, 'month': 3, 'count': 6600}, {'year': 2022, 'month': 2, 'count': 6600}, {'year': 2022, 'month': 1, 'count': 6600}, {'year': 2021, 'month': 12, 'count': 5400}, {'year': 2021, 'month': 11, 'count': 5400}, {'year': 2021, 'month': 10, 'count': 4400}, {'year': 2021, 'month': 9, 'count': 4400}, {'year': 2021, 'month': 8, 'count': 6600}, {'year': 2021, 'month': 7, 'count': 5400}, {'year': 2021, 'month': 6, 'count': 5400}, {'year': 2021, 'month': 5, 'count': 5400}, {'year': 2021, 'month': 4, 'count': 5400}]
SEARCH_VOLUME LongAttribute 5400
KEYWORD_TEXT StringAttribute malawi
TARGETED_MONTHLY_SEARCHES MonthlySearchVolumeAttribute [{'year': 2022, 'month': 3, 'count': 22200}, {'year': 2022, 'month': 2, 'count': 18100}, {'year': 2022, 'month': 1, 'count': 33100}, {'year': 2021, 'month': 12, 'count': 14800}, {'year': 2021, 'month': 11, 'count': 18100}, {'year': 2021, 'month': 10, 'count': 14800}, {'year': 2021, 'month': 9, 'count': 12100}, {'year': 2021, 'month': 8, 'count': 12100}, {'year': 2021, 'month': 7, 'count': 14800}, {'year': 2021, 'month': 6, 'count': 14800}, {'year': 2021, 'month': 5, 'count': 18100}, {'year': 2021, 'month': 4, 'count': 14800}]
SEARCH_VOLUME LongAttribute 18100

I am totally lost of how to get the year and month correctly concatted and as columns? Thanks in advance for your time and support.

CodePudding user response:

This does the job,

# Getting all the needed data from the json output.
valid_data = []

for entry in json_output["entries"]:
  data = entry["data"]
  valid_data.append([i["value"]["value"] for i in data])

all_month_counts = []

# Getting all the month counts sorted.
for d in valid_data:
  month_counts = {}
  for month in d[1]:
    month_counts[f"{month['month']}.{month['year']}"] = month["count"]
  all_month_counts.append(month_counts)

# Converting all data into a dataframe.
df = pd.DataFrame(valid_data, columns = ["KEYWORD_TXT", "COUNTS", "SEARCH_VOLUME"])
month_counts_df = pd.DataFrame(all_month_counts)

df.drop(columns = "COUNTS", inplace = True)
df = pd.concat([df, month_counts_df], axis = 1)

Output -

KEYWORD_TXT SEARCH_VOLUME 3.2022 2.2022 1.2022 12.2021 11.2021 10.2021 9.2021 8.2021 7.2021 6.2021 5.2021 4.2021
0 zambia 5400 6600 6600 6600 5400 5400 4400 4400 6600 5400 5400 5400 5400
1 malawi 18100 22200 18100 33100 14800 18100 14800 12100 12100 14800 14800 18100 14800
  • Related