JSON nested list to Pandas dataframe-CodePudding

I have a json file which looks like this:

    "Aveiro": {
        "Albergaria-a-Velha": {
            "candidates": [
                {
                    "effectiveCandidates": [
                        "JOSÉ OLIVEIRA SANTOS"
                    ],
                    "party": "B.E.",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "B.E.",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 1.34,
                        "presidents": 0,
                        "validVotesPercentage": 1.4,
                        "votes": 179
                    }
                },
                {
                    "effectiveCandidates": [
                        "ANTÓNIO AUGUSTO AMARAL LOUREIRO E SANTOS"
                    ],
                    "party": "CDS-PP",
                    "votes": {
                        "absoluteMajority": 1,
                        "acronym": "CDS-PP",
                        "constituenctyCounter": 1,
                        "mandates": 5,
                        "percentage": 59.7,
                        "presidents": 1,
                        "validVotesPercentage": 62.5,
                        "votes": 7970
                    }
                },
                {
                    "effectiveCandidates": [
                        "CARLOS MANUEL DA COSTA SERVEIRA VASQUES"
                    ],
                    "party": "CH",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "CH",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 1.87,
                        "presidents": 0,
                        "validVotesPercentage": 1.95,
                        "votes": 249
                    }
                },
                {
                    "effectiveCandidates": [
                        "RODRIGO MANUEL PEREIRA MARQUES LOURENÇO"
                    ],
                    "party": "PCP-PEV",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "PCP-PEV",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 1.57,
                        "presidents": 0,
                        "validVotesPercentage": 1.65,
                        "votes": 210
                    }
                },
                {
                    "effectiveCandidates": [
                        "DELFINA LISBOA MARTINS DA CUNHA"
                    ],
                    "party": "PPD/PSD",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "PPD/PSD",
                        "constituenctyCounter": 1,
                        "mandates": 2,
                        "percentage": 24.23,
                        "presidents": 0,
                        "validVotesPercentage": 25.37,
                        "votes": 3235
                    }
                },
                {
                    "effectiveCandidates": [
                        "JESUS MANUEL VIDINHA TOMÁS"
                    ],
                    "party": "PS",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "PS",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 6.82,
                        "presidents": 0,
                        "validVotesPercentage": 7.14,
                        "votes": 910
                    }
                }
            ],
            "parentTerritoryName": "Aveiro",
            "territoryKey": "LOCAL-010200",
            "territoryName": "Albergaria-a-Velha",
            "total_votes": {
                "availableMandates": 0,
                "blankVotes": 377,
                "blankVotesPercentage": 2.82,
                "displayMessage": null,
                "hasNoVoting": false,
                "nullVotes": 221,
                "nullVotesPercentage": 1.66,
                "numberParishes": 6,
                "numberVoters": 13351,
                "percentageVoters": 59.48
            }
        },

The full file is here for reference

I thought that this code would work

import pandas as pd 
from pandas import json_normalize
import json


with open('autarquicas_2021.json') as f:
    data = json.load(f)

df = pd.json_normalize(data)

However this is returning the following:

df.head()
Aveiro.Albergaria-a-Velha.candidates  ... Évora.Évora.total_votes.percentageVoters
0  [{'effectiveCandidates': ['JOSÉ OLIVEIRA SANTO...  ...                                    49.84

[1 rows x 4312 columns]

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Columns: 4312 entries, Aveiro.Albergaria-a-Velha.candidates to Évora.Évora.total_votes.percentageVoters
dtypes: bool(308), float64(924), int64(1540), object(1540)
memory usage: 31.7  KB
None

For some reason the code is not working, and my research has led me to no solutions, as it seems that every json file has a mind of its own.

Any help would be much appreciated. Thank you in advance!

Disclaimer: This is for an open source project to bring more transparency into local elections in Portugal. It will not be used for commercial, or for profit projects.

CodePudding user response：

You can use json_normalize with a little transformation of original JSON format.

Convert JSON into list format. I am assuming "Aveiro" as city, and "Albergaria-a-Velha" as district. Apologies of my unfamiliarity of the area, so if it is wrong, please rename the key.

res = [{**z, **{'city': x, 'district': y}} for x, y in data.items() for y, z in y.items()]

This will transform original JSON of key-values style into list of objects.

[{
    "city": "Aveiro",
    "district": "Albergaria-a-Velha",
    "candidates": [{
        ...
}]

Then use json_normalize.

df = pd.json_normalize(res, record_path=['candidates'], meta=['total_votes', 'city', 'district'])

Further expanding the nested object total_votes.

df = pd.concat([df, pd.json_normalize(df['total_votes'])], axis=1)

>>> df.iloc[0]
effectiveCandidates                                      [JOSÉ OLIVEIRA SANTOS]
party                                                                      B.E.
votes.absoluteMajority                                                        0
votes.acronym                                                              B.E.
votes.constituenctyCounter                                                    1
votes.mandates                                                                0
votes.percentage                                                           1.34
votes.presidents                                                              0
votes.validVotesPercentage                                                  1.4
votes.votes                                                                 179
total_votes                   {'availableMandates': 0, 'blankVotes': 377, 'b...
city                                                                     Aveiro
district                                                     Albergaria-a-Velha
availableMandates                                                             0
blankVotes                                                                  377
blankVotesPercentage                                                       2.82
displayMessage                                                             None
hasNoVoting                                                               False
nullVotes                                                                   221
nullVotesPercentage                                                        1.66
numberParishes                                                                6
numberVoters                                                              13351
percentageVoters                                                          59.48
Name: 0, dtype: object

CodePudding user response：

Recursive Approach:

I usually use this function (a recursive approach) to do that kind of thing:

# Function for flattening 
# json
def flatten_json(y):
    out = {}
  
    def flatten(x, name =''):
          
        # If the Nested key-value 
        # pair is of dict type
        if type(x) is dict:
              
            for a in x:
                flatten(x[a], name   a   '_')
                  
        # If the Nested key-value
        # pair is of list type
        elif type(x) is list:
              
            i = 0
              
            for a in x:                
                flatten(a, name   str(i)   '_')
                i  = 1
        else:
            out[name[:-1]] = x
  
    flatten(y)
    return out

You can call flatten_json for flattening your nested json.

# Driver code
print(flatten_json(data))

Library-based approach:

from flatten_json import flatten

unflat_json = {'user' :
               {'foo':
                {'UserID':0123456,
                'Email': '[email protected]', 
                'friends': ['Johnny', 'Mark', 'Tom']
                }
               }
              }
  
flat_json = flatten(unflat_json)
  
print(flat_json)