I am retrieving data via an API which returns a zeep.objects that looks like this:
{
'totalNumEntries': 2,
'entries': [
{
'data': [
{
'key': 'KEYWORD_TEXT',
'value': {
'Attribute.Type': 'StringAttribute',
'value': 'zambia'
}
},
{
'key': 'TARGETED_MONTHLY_SEARCHES',
'value': {
'Attribute.Type': 'MonthlySearchVolumeAttribute',
'value': [
{
'year': 2022,
'month': 3,
'count': 6600
},
{
'year': 2022,
'month': 2,
'count': 6600
},
{
'year': 2022,
'month': 1,
'count': 6600
},
{
'year': 2021,
'month': 12,
'count': 5400
},
{
'year': 2021,
'month': 11,
'count': 5400
},
{
'year': 2021,
'month': 10,
'count': 4400
},
{
'year': 2021,
'month': 9,
'count': 4400
},
{
'year': 2021,
'month': 8,
'count': 6600
},
{
'year': 2021,
'month': 7,
'count': 5400
},
{
'year': 2021,
'month': 6,
'count': 5400
},
{
'year': 2021,
'month': 5,
'count': 5400
},
{
'year': 2021,
'month': 4,
'count': 5400
}
]
}
},
{
'key': 'SEARCH_VOLUME',
'value': {
'Attribute.Type': 'LongAttribute',
'value': 5400
}
}
]
},
{
'data': [
{
'key': 'KEYWORD_TEXT',
'value': {
'Attribute.Type': 'StringAttribute',
'value': 'malawi'
}
},
{
'key': 'TARGETED_MONTHLY_SEARCHES',
'value': {
'Attribute.Type': 'MonthlySearchVolumeAttribute',
'value': [
{
'year': 2022,
'month': 3,
'count': 22200
},
{
'year': 2022,
'month': 2,
'count': 18100
},
{
'year': 2022,
'month': 1,
'count': 33100
},
{
'year': 2021,
'month': 12,
'count': 14800
},
{
'year': 2021,
'month': 11,
'count': 18100
},
{
'year': 2021,
'month': 10,
'count': 14800
},
{
'year': 2021,
'month': 9,
'count': 12100
},
{
'year': 2021,
'month': 8,
'count': 12100
},
{
'year': 2021,
'month': 7,
'count': 14800
},
{
'year': 2021,
'month': 6,
'count': 14800
},
{
'year': 2021,
'month': 5,
'count': 18100
},
{
'year': 2021,
'month': 4,
'count': 14800
}
]
}
},
{
'key': 'SEARCH_VOLUME',
'value': {
'Attribute.Type': 'LongAttribute',
'value': 18100
}
}
]
}
]
}
The end result should be a dataframe that looks like this:
KEYWORD_TEXT | SEARCH_VOLUME | 3.2022 | 2.2022 | 1.2022 | 12.2021 | ... |
---|---|---|---|---|---|---|
zambia | 5400 | 6600 | 6600 | 6600 | 5400 | ... |
malawi | 18100 | 22200 | 18100 | 33100 | 14800 | ... |
What I have tried so far but failed is this:
# Convert zeep object to a json:
from zeep import helpers
_json = helpers.serialize_object(zeep_json, dict)
_json
{'totalNumEntries': 2,
'entries': [{'data': [{'key': 'KEYWORD_TEXT',
'value': {'Attribute.Type': 'StringAttribute', 'value': 'zambia'}},
{'key': 'TARGETED_MONTHLY_SEARCHES',
'value': {'Attribute.Type': 'MonthlySearchVolumeAttribute',
'value': [{'year': 2022, 'month': 3, 'count': 6600},
{'year': 2022, 'month': 2, 'count': 6600},
{'year': 2022, 'month': 1, 'count': 6600},
{'year': 2021, 'month': 12, 'count': 5400},
{'year': 2021, 'month': 11, 'count': 5400},
{'year': 2021, 'month': 10, 'count': 4400},
{'year': 2021, 'month': 9, 'count': 4400},
{'year': 2021, 'month': 8, 'count': 6600},
{'year': 2021, 'month': 7, 'count': 5400},
{'year': 2021, 'month': 6, 'count': 5400},
{'year': 2021, 'month': 5, 'count': 5400},
{'year': 2021, 'month': 4, 'count': 5400}]}},
{'key': 'SEARCH_VOLUME',
'value': {'Attribute.Type': 'LongAttribute', 'value': 5400}}]},
{'data': [{'key': 'KEYWORD_TEXT',
'value': {'Attribute.Type': 'StringAttribute', 'value': 'malawi'}},
{'key': 'TARGETED_MONTHLY_SEARCHES',
'value': {'Attribute.Type': 'MonthlySearchVolumeAttribute',
'value': [{'year': 2022, 'month': 3, 'count': 22200},
{'year': 2022, 'month': 2, 'count': 18100},
{'year': 2022, 'month': 1, 'count': 33100},
{'year': 2021, 'month': 12, 'count': 14800},
{'year': 2021, 'month': 11, 'count': 18100},
{'year': 2021, 'month': 10, 'count': 14800},
{'year': 2021, 'month': 9, 'count': 12100},
{'year': 2021, 'month': 8, 'count': 12100},
{'year': 2021, 'month': 7, 'count': 14800},
{'year': 2021, 'month': 6, 'count': 14800},
{'year': 2021, 'month': 5, 'count': 18100},
{'year': 2021, 'month': 4, 'count': 14800}]}},
{'key': 'SEARCH_VOLUME',
'value': {'Attribute.Type': 'LongAttribute', 'value': 18100}}]}]}
# and then try to load that into a dataframe:
df = pd.json_normalize(_json['entries'],
record_path=['data'],
errors='ignore’)
df.head()
which results in this:
_key | _value.Attribute.Type | _value.value |
---|---|---|
KEYWORD_TEXT | StringAttribute | zambia |
TARGETED_MONTHLY_SEARCHES | MonthlySearchVolumeAttribute | [{'year': 2022, 'month': 3, 'count': 6600}, {'year': 2022, 'month': 2, 'count': 6600}, {'year': 2022, 'month': 1, 'count': 6600}, {'year': 2021, 'month': 12, 'count': 5400}, {'year': 2021, 'month': 11, 'count': 5400}, {'year': 2021, 'month': 10, 'count': 4400}, {'year': 2021, 'month': 9, 'count': 4400}, {'year': 2021, 'month': 8, 'count': 6600}, {'year': 2021, 'month': 7, 'count': 5400}, {'year': 2021, 'month': 6, 'count': 5400}, {'year': 2021, 'month': 5, 'count': 5400}, {'year': 2021, 'month': 4, 'count': 5400}] |
SEARCH_VOLUME | LongAttribute | 5400 |
KEYWORD_TEXT | StringAttribute | malawi |
TARGETED_MONTHLY_SEARCHES | MonthlySearchVolumeAttribute | [{'year': 2022, 'month': 3, 'count': 22200}, {'year': 2022, 'month': 2, 'count': 18100}, {'year': 2022, 'month': 1, 'count': 33100}, {'year': 2021, 'month': 12, 'count': 14800}, {'year': 2021, 'month': 11, 'count': 18100}, {'year': 2021, 'month': 10, 'count': 14800}, {'year': 2021, 'month': 9, 'count': 12100}, {'year': 2021, 'month': 8, 'count': 12100}, {'year': 2021, 'month': 7, 'count': 14800}, {'year': 2021, 'month': 6, 'count': 14800}, {'year': 2021, 'month': 5, 'count': 18100}, {'year': 2021, 'month': 4, 'count': 14800}] |
SEARCH_VOLUME | LongAttribute | 18100 |
I am totally lost of how to get the year and month correctly concatted and as columns? Thanks in advance for your time and support.
CodePudding user response:
This does the job,
# Getting all the needed data from the json output.
valid_data = []
for entry in json_output["entries"]:
data = entry["data"]
valid_data.append([i["value"]["value"] for i in data])
all_month_counts = []
# Getting all the month counts sorted.
for d in valid_data:
month_counts = {}
for month in d[1]:
month_counts[f"{month['month']}.{month['year']}"] = month["count"]
all_month_counts.append(month_counts)
# Converting all data into a dataframe.
df = pd.DataFrame(valid_data, columns = ["KEYWORD_TXT", "COUNTS", "SEARCH_VOLUME"])
month_counts_df = pd.DataFrame(all_month_counts)
df.drop(columns = "COUNTS", inplace = True)
df = pd.concat([df, month_counts_df], axis = 1)
Output -
KEYWORD_TXT | SEARCH_VOLUME | 3.2022 | 2.2022 | 1.2022 | 12.2021 | 11.2021 | 10.2021 | 9.2021 | 8.2021 | 7.2021 | 6.2021 | 5.2021 | 4.2021 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | zambia | 5400 | 6600 | 6600 | 6600 | 5400 | 5400 | 4400 | 4400 | 6600 | 5400 | 5400 | 5400 | 5400 |
1 | malawi | 18100 | 22200 | 18100 | 33100 | 14800 | 18100 | 14800 | 12100 | 12100 | 14800 | 14800 | 18100 | 14800 |