I have the following example json object and wish to return multiple key value pairs (by parsing the object either using a loop or some other method) and store these as the headed columns and rows in a dataframe.
I also want to be able to add a condition (preferably before the loop) to limit the dataset so that processing a much, much larger json dataset is quicker.
Here's the json:
x = {
"Data":[
{
"City":"Barcelona",
"Country":"Spain",
"Population":"1,620,343"
},
{
"City":"Tokyo",
"Country":"Japan",
"Population":"14,043,239"
},
{
"City":"Helsinki",
"Country":"Finland",
"Population":"658,864"
},
{
"City":"Paris",
"Country":"France",
"Population":"2,165,423"
},
{
"City":"Bologna",
"Country":"Italy",
"Population":"388,367"
},
{
"City":"Verona",
"Country":"Italy",
"Population":"257,353"
},
{
"City":"Cartagena",
"Country":"Colombia",
"Population":"914,552"
}
]
}
I can return specifcic values using the following...
output = [{element['City'], element['Country'], element['Population']} for element in x['Data']]
print(output)
which returns...
[{'1,620,343', 'Spain', 'Barcelona'}, {'14,043,239', 'Japan', 'Tokyo'}, {'Helsinki', '658,864', 'Finland'}, {'2,165,423', 'France', 'Paris'}, {'Italy', 'Bologna', '388,367'}, {'257,353', 'Verona', 'Italy'}, {'Colombia', 'Cartagena', '914,552'}]
Why is the order of the key value pairs not preserved? Some appears 'Population, Country, City', others appear as 'Country, City, Population' etc etc.
How might I transform this output to a dataframe for easier manipulation?
With a larger dataset, how might I add a condition to limit the volume so as to reduce the computational expense required to parse the json object?
Thanks
CodePudding user response:
Why is the order of the key value pairs not preserved? Some appears 'Population, Country, City', others appear as 'Country, City, Population' etc etc.