Convert nested zeep or JSON to a pandas Dataframe-CodePudding

I am retrieving data via an API which returns a zeep.objects that looks like this:


   {
    'totalNumEntries': 2,
    'entries': [
        {
            'data': [
                {
                    'key': 'KEYWORD_TEXT',
                    'value': {
                        'Attribute.Type': 'StringAttribute',
                        'value': 'zambia'
                    }
                },
                {
                    'key': 'TARGETED_MONTHLY_SEARCHES',
                    'value': {
                        'Attribute.Type': 'MonthlySearchVolumeAttribute',
                        'value': [
                            {
                                'year': 2022,
                                'month': 3,
                                'count': 6600
                            },
                            {
                                'year': 2022,
                                'month': 2,
                                'count': 6600
                            },
                            {
                                'year': 2022,
                                'month': 1,
                                'count': 6600
                            },
                            {
                                'year': 2021,
                                'month': 12,
                                'count': 5400
                            },
                            {
                                'year': 2021,
                                'month': 11,
                                'count': 5400
                            },
                            {
                                'year': 2021,
                                'month': 10,
                                'count': 4400
                            },
                            {
                                'year': 2021,
                                'month': 9,
                                'count': 4400
                            },
                            {
                                'year': 2021,
                                'month': 8,
                                'count': 6600
                            },
                            {
                                'year': 2021,
                                'month': 7,
                                'count': 5400
                            },
                            {
                                'year': 2021,
                                'month': 6,
                                'count': 5400
                            },
                            {
                                'year': 2021,
                                'month': 5,
                                'count': 5400
                            },
                            {
                                'year': 2021,
                                'month': 4,
                                'count': 5400
                            }
                        ]
                    }
                },
                {
                    'key': 'SEARCH_VOLUME',
                    'value': {
                        'Attribute.Type': 'LongAttribute',
                        'value': 5400
                    }
                }
            ]
        },
        {
            'data': [
                {
                    'key': 'KEYWORD_TEXT',
                    'value': {
                        'Attribute.Type': 'StringAttribute',
                        'value': 'malawi'
                    }
                },
                {
                    'key': 'TARGETED_MONTHLY_SEARCHES',
                    'value': {
                        'Attribute.Type': 'MonthlySearchVolumeAttribute',
                        'value': [
                            {
                                'year': 2022,
                                'month': 3,
                                'count': 22200
                            },
                            {
                                'year': 2022,
                                'month': 2,
                                'count': 18100
                            },
                            {
                                'year': 2022,
                                'month': 1,
                                'count': 33100
                            },
                            {
                                'year': 2021,
                                'month': 12,
                                'count': 14800
                            },
                            {
                                'year': 2021,
                                'month': 11,
                                'count': 18100
                            },
                            {
                                'year': 2021,
                                'month': 10,
                                'count': 14800
                            },
                            {
                                'year': 2021,
                                'month': 9,
                                'count': 12100
                            },
                            {
                                'year': 2021,
                                'month': 8,
                                'count': 12100
                            },
                            {
                                'year': 2021,
                                'month': 7,
                                'count': 14800
                            },
                            {
                                'year': 2021,
                                'month': 6,
                                'count': 14800
                            },
                            {
                                'year': 2021,
                                'month': 5,
                                'count': 18100
                            },
                            {
                                'year': 2021,
                                'month': 4,
                                'count': 14800
                            }
                        ]
                    }
                },
                {
                    'key': 'SEARCH_VOLUME',
                    'value': {
                        'Attribute.Type': 'LongAttribute',
                        'value': 18100
                    }
                }
            ]
        }
    ]
}

The end result should be a dataframe that looks like this:

KEYWORD_TEXT	SEARCH_VOLUME	3.2022	2.2022	1.2022	12.2021	...
zambia	5400	6600	6600	6600	5400	...
malawi	18100	22200	18100	33100	14800	...

What I have tried so far but failed is this:

#  Convert zeep object to a json: 
from zeep import helpers
_json = helpers.serialize_object(zeep_json, dict)
_json



{'totalNumEntries': 2,
 'entries': [{'data': [{'key': 'KEYWORD_TEXT',
     'value': {'Attribute.Type': 'StringAttribute', 'value': 'zambia'}},
    {'key': 'TARGETED_MONTHLY_SEARCHES',
     'value': {'Attribute.Type': 'MonthlySearchVolumeAttribute',
      'value': [{'year': 2022, 'month': 3, 'count': 6600},
       {'year': 2022, 'month': 2, 'count': 6600},
       {'year': 2022, 'month': 1, 'count': 6600},
       {'year': 2021, 'month': 12, 'count': 5400},
       {'year': 2021, 'month': 11, 'count': 5400},
       {'year': 2021, 'month': 10, 'count': 4400},
       {'year': 2021, 'month': 9, 'count': 4400},
       {'year': 2021, 'month': 8, 'count': 6600},
       {'year': 2021, 'month': 7, 'count': 5400},
       {'year': 2021, 'month': 6, 'count': 5400},
       {'year': 2021, 'month': 5, 'count': 5400},
       {'year': 2021, 'month': 4, 'count': 5400}]}},
    {'key': 'SEARCH_VOLUME',
     'value': {'Attribute.Type': 'LongAttribute', 'value': 5400}}]},
  {'data': [{'key': 'KEYWORD_TEXT',
     'value': {'Attribute.Type': 'StringAttribute', 'value': 'malawi'}},
    {'key': 'TARGETED_MONTHLY_SEARCHES',
     'value': {'Attribute.Type': 'MonthlySearchVolumeAttribute',
      'value': [{'year': 2022, 'month': 3, 'count': 22200},
       {'year': 2022, 'month': 2, 'count': 18100},
       {'year': 2022, 'month': 1, 'count': 33100},
       {'year': 2021, 'month': 12, 'count': 14800},
       {'year': 2021, 'month': 11, 'count': 18100},
       {'year': 2021, 'month': 10, 'count': 14800},
       {'year': 2021, 'month': 9, 'count': 12100},
       {'year': 2021, 'month': 8, 'count': 12100},
       {'year': 2021, 'month': 7, 'count': 14800},
       {'year': 2021, 'month': 6, 'count': 14800},
       {'year': 2021, 'month': 5, 'count': 18100},
       {'year': 2021, 'month': 4, 'count': 14800}]}},
    {'key': 'SEARCH_VOLUME',
     'value': {'Attribute.Type': 'LongAttribute', 'value': 18100}}]}]}

# and then try to load that into a dataframe: 

df = pd.json_normalize(_json['entries'],
   record_path=['data'],
   errors='ignore’)
df.head()

which results in this:

_key	_value.Attribute.Type	_value.value
KEYWORD_TEXT	StringAttribute	zambia
TARGETED_MONTHLY_SEARCHES	MonthlySearchVolumeAttribute	[{'year': 2022, 'month': 3, 'count': 6600}, {'year': 2022, 'month': 2, 'count': 6600}, {'year': 2022, 'month': 1, 'count': 6600}, {'year': 2021, 'month': 12, 'count': 5400}, {'year': 2021, 'month': 11, 'count': 5400}, {'year': 2021, 'month': 10, 'count': 4400}, {'year': 2021, 'month': 9, 'count': 4400}, {'year': 2021, 'month': 8, 'count': 6600}, {'year': 2021, 'month': 7, 'count': 5400}, {'year': 2021, 'month': 6, 'count': 5400}, {'year': 2021, 'month': 5, 'count': 5400}, {'year': 2021, 'month': 4, 'count': 5400}]
SEARCH_VOLUME	LongAttribute	5400
KEYWORD_TEXT	StringAttribute	malawi
TARGETED_MONTHLY_SEARCHES	MonthlySearchVolumeAttribute	[{'year': 2022, 'month': 3, 'count': 22200}, {'year': 2022, 'month': 2, 'count': 18100}, {'year': 2022, 'month': 1, 'count': 33100}, {'year': 2021, 'month': 12, 'count': 14800}, {'year': 2021, 'month': 11, 'count': 18100}, {'year': 2021, 'month': 10, 'count': 14800}, {'year': 2021, 'month': 9, 'count': 12100}, {'year': 2021, 'month': 8, 'count': 12100}, {'year': 2021, 'month': 7, 'count': 14800}, {'year': 2021, 'month': 6, 'count': 14800}, {'year': 2021, 'month': 5, 'count': 18100}, {'year': 2021, 'month': 4, 'count': 14800}]
SEARCH_VOLUME	LongAttribute	18100

I am totally lost of how to get the year and month correctly concatted and as columns? Thanks in advance for your time and support.

CodePudding user response：

This does the job,

# Getting all the needed data from the json output.
valid_data = []

for entry in json_output["entries"]:
  data = entry["data"]
  valid_data.append([i["value"]["value"] for i in data])

all_month_counts = []

# Getting all the month counts sorted.
for d in valid_data:
  month_counts = {}
  for month in d[1]:
    month_counts[f"{month['month']}.{month['year']}"] = month["count"]
  all_month_counts.append(month_counts)

# Converting all data into a dataframe.
df = pd.DataFrame(valid_data, columns = ["KEYWORD_TXT", "COUNTS", "SEARCH_VOLUME"])
month_counts_df = pd.DataFrame(all_month_counts)

df.drop(columns = "COUNTS", inplace = True)
df = pd.concat([df, month_counts_df], axis = 1)

Output -

	KEYWORD_TXT	SEARCH_VOLUME	3.2022	2.2022	1.2022	12.2021	11.2021	10.2021	9.2021	8.2021	7.2021	6.2021	5.2021	4.2021
0	zambia	5400	6600	6600	6600	5400	5400	4400	4400	6600	5400	5400	5400	5400
1	malawi	18100	22200	18100	33100	14800	18100	14800	12100	12100	14800	14800	18100	14800