I have a json file which looks like this:
"Aveiro": {
"Albergaria-a-Velha": {
"candidates": [
{
"effectiveCandidates": [
"JOSÉ OLIVEIRA SANTOS"
],
"party": "B.E.",
"votes": {
"absoluteMajority": 0,
"acronym": "B.E.",
"constituenctyCounter": 1,
"mandates": 0,
"percentage": 1.34,
"presidents": 0,
"validVotesPercentage": 1.4,
"votes": 179
}
},
{
"effectiveCandidates": [
"ANTÓNIO AUGUSTO AMARAL LOUREIRO E SANTOS"
],
"party": "CDS-PP",
"votes": {
"absoluteMajority": 1,
"acronym": "CDS-PP",
"constituenctyCounter": 1,
"mandates": 5,
"percentage": 59.7,
"presidents": 1,
"validVotesPercentage": 62.5,
"votes": 7970
}
},
{
"effectiveCandidates": [
"CARLOS MANUEL DA COSTA SERVEIRA VASQUES"
],
"party": "CH",
"votes": {
"absoluteMajority": 0,
"acronym": "CH",
"constituenctyCounter": 1,
"mandates": 0,
"percentage": 1.87,
"presidents": 0,
"validVotesPercentage": 1.95,
"votes": 249
}
},
{
"effectiveCandidates": [
"RODRIGO MANUEL PEREIRA MARQUES LOURENÇO"
],
"party": "PCP-PEV",
"votes": {
"absoluteMajority": 0,
"acronym": "PCP-PEV",
"constituenctyCounter": 1,
"mandates": 0,
"percentage": 1.57,
"presidents": 0,
"validVotesPercentage": 1.65,
"votes": 210
}
},
{
"effectiveCandidates": [
"DELFINA LISBOA MARTINS DA CUNHA"
],
"party": "PPD/PSD",
"votes": {
"absoluteMajority": 0,
"acronym": "PPD/PSD",
"constituenctyCounter": 1,
"mandates": 2,
"percentage": 24.23,
"presidents": 0,
"validVotesPercentage": 25.37,
"votes": 3235
}
},
{
"effectiveCandidates": [
"JESUS MANUEL VIDINHA TOMÁS"
],
"party": "PS",
"votes": {
"absoluteMajority": 0,
"acronym": "PS",
"constituenctyCounter": 1,
"mandates": 0,
"percentage": 6.82,
"presidents": 0,
"validVotesPercentage": 7.14,
"votes": 910
}
}
],
"parentTerritoryName": "Aveiro",
"territoryKey": "LOCAL-010200",
"territoryName": "Albergaria-a-Velha",
"total_votes": {
"availableMandates": 0,
"blankVotes": 377,
"blankVotesPercentage": 2.82,
"displayMessage": null,
"hasNoVoting": false,
"nullVotes": 221,
"nullVotesPercentage": 1.66,
"numberParishes": 6,
"numberVoters": 13351,
"percentageVoters": 59.48
}
},
The full file is here for reference
I thought that this code would work
import pandas as pd
from pandas import json_normalize
import json
with open('autarquicas_2021.json') as f:
data = json.load(f)
df = pd.json_normalize(data)
However this is returning the following:
df.head()
Aveiro.Albergaria-a-Velha.candidates ... Évora.Évora.total_votes.percentageVoters
0 [{'effectiveCandidates': ['JOSÉ OLIVEIRA SANTO... ... 49.84
[1 rows x 4312 columns]
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Columns: 4312 entries, Aveiro.Albergaria-a-Velha.candidates to Évora.Évora.total_votes.percentageVoters
dtypes: bool(308), float64(924), int64(1540), object(1540)
memory usage: 31.7 KB
None
For some reason the code is not working, and my research has led me to no solutions, as it seems that every json file has a mind of its own.
Any help would be much appreciated. Thank you in advance!
Disclaimer: This is for an open source project to bring more transparency into local elections in Portugal. It will not be used for commercial, or for profit projects.
CodePudding user response:
You can use json_normalize
with a little transformation of original JSON format.
- Convert JSON into list format. I am assuming "Aveiro" as city, and "Albergaria-a-Velha" as district. Apologies of my unfamiliarity of the area, so if it is wrong, please rename the key.
res = [{**z, **{'city': x, 'district': y}} for x, y in data.items() for y, z in y.items()]
This will transform original JSON of key-values style into list of objects.
[{
"city": "Aveiro",
"district": "Albergaria-a-Velha",
"candidates": [{
...
}]
- Then use
json_normalize
.
df = pd.json_normalize(res, record_path=['candidates'], meta=['total_votes', 'city', 'district'])
- Further expanding the nested object
total_votes
.
df = pd.concat([df, pd.json_normalize(df['total_votes'])], axis=1)
>>> df.iloc[0]
effectiveCandidates [JOSÉ OLIVEIRA SANTOS]
party B.E.
votes.absoluteMajority 0
votes.acronym B.E.
votes.constituenctyCounter 1
votes.mandates 0
votes.percentage 1.34
votes.presidents 0
votes.validVotesPercentage 1.4
votes.votes 179
total_votes {'availableMandates': 0, 'blankVotes': 377, 'b...
city Aveiro
district Albergaria-a-Velha
availableMandates 0
blankVotes 377
blankVotesPercentage 2.82
displayMessage None
hasNoVoting False
nullVotes 221
nullVotesPercentage 1.66
numberParishes 6
numberVoters 13351
percentageVoters 59.48
Name: 0, dtype: object
CodePudding user response:
Recursive Approach:
I usually use this function (a recursive approach)
to do that kind of thing:
# Function for flattening
# json
def flatten_json(y):
out = {}
def flatten(x, name =''):
# If the Nested key-value
# pair is of dict type
if type(x) is dict:
for a in x:
flatten(x[a], name a '_')
# If the Nested key-value
# pair is of list type
elif type(x) is list:
i = 0
for a in x:
flatten(a, name str(i) '_')
i = 1
else:
out[name[:-1]] = x
flatten(y)
return out
You can call flatten_json
for flattening your nested json.
# Driver code
print(flatten_json(data))
Library-based approach:
from flatten_json import flatten
unflat_json = {'user' :
{'foo':
{'UserID':0123456,
'Email': '[email protected]',
'friends': ['Johnny', 'Mark', 'Tom']
}
}
}
flat_json = flatten(unflat_json)
print(flat_json)