I'm getting data from an api which has a nested Json file. I write the file and it creates like this:
{"country": "Afghanistan", "timeline": [{"total": 6355931, "daily": 0, "totalPerHundred": 0, "dailyPerMillion": 0, "date": "6/24/22"}]}
{"country": "Albania", "timeline": [{"total": 2883079, "daily": 0, "totalPerHundred": 0, "dailyPerMillion": 0, "date": "6/24/22"}]}
{"country": "Algeria", "timeline": [{"total": 15205854, "daily": 0, "totalPerHundred": 0, "dailyPerMillion": 0, "date": "6/24/22"}]}
issue I'm facing is that I need to break up 'timeline' and leave me with just total, daily and date so that it looks more like this:-
{"country": "Afghanistan", "total": 6355931, "daily": 0, "totalPerHundred": 0, "dailyPerMillion": 0, "date": "6/24/22"}
{"country": "Albania", "total": 2883079, "daily": 0, "totalPerHundred": 0, "dailyPerMillion": 0, "date": "6/24/22"}
{"country": "Algeria", "total": 15205854, "daily": 0, "totalPerHundred": 0, "dailyPerMillion": 0, "date": "6/24/22"}
I've tried json_normalise which didn't work so I'm wondering what is going wrong. The code is:-
def get_country_vaccines(self, last_days: int = 3, full_data = 'true') -> requests.Response:
return requests.get(self.host 'vaccine/coverage/countries', params={'lastdays': last_days, 'fullData': full_data})
class VaccineData(CovidBase):
__tablename__ = 'vaccination_data'
base_value = Column(Integer, primary_key= True)
country = Column(String)
timeline = Column(JSON)
@classmethod
def from_requests(cls, request: dict):
return cls(
country=request.get('country'),
timeline = request.get('timeline')
)
def to_bigquery_row(self):
return {
'country': self.country,
'timeline': self.timeline,
}
with open('covidinfo.json','w') as newfile:
response = get_country_vaccines('1')
for item in response.json():
data = sq_models.VaccineData.from_requests(item)
newfile.write(json.dumps(data.to_bigquery_row()))
newfile.write('\n')
input(data)
How could I break up the information in timeline so that each field inside it is now separate? Sorry I'm fairly new to python so just looking for some help with this.
CodePudding user response:
You can traverse the original JSON
data as a Python dictionary:
d = {"country": "Afghanistan", "timeline": [{"total": 6355931, "daily": 0, "totalPerHundred": 0, "dailyPerMillion": 0, "date": "6/24/22"}]}
d1 = {"country":d["country"]}
for i in d["timeline"][0].keys():
d1[i] = d["timeline"][0][i]
print(d1)
Output: The final transformed dict would look like this
{'country': 'Afghanistan', 'total': 6355931, 'daily': 0, 'totalPerHundred': 0, 'dailyPerMillion': 0, 'date': '6/24/22'}
CodePudding user response:
If you know your data structure well and you don't want to involve any magic, I'd just iterate over the original data and flatten the timeline entries.
One possible way to do it (assuming that the timeline
array always holds a single element):
nested_data = [
{"country": "Afghanistan", "timeline": [{"total": 6355931, "daily": 0, "totalPerHundred": 0, "dailyPerMillion": 0, "date": "6/24/22"}]},
{"country": "Albania", "timeline": [{"total": 2883079, "daily": 0, "totalPerHundred": 0, "dailyPerMillion": 0, "date": "6/24/22"}]},
{"country": "Algeria", "timeline": [{"total": 15205854, "daily": 0, "totalPerHundred": 0, "dailyPerMillion": 0, "date": "6/24/22"}]}
]
flat_data = []
for nested_obj in nested_data:
flat_obj = {}
for key, value in nested_obj.items():
if key == "timeline":
for timeline_key, timeline_value in nested_obj[key][0].items():
flat_obj[timeline_key] = timeline_value
else:
flat_obj[key] = value
flat_data.append(flat_obj)
print(flat_data)