Home > OS >  JSON nested list to Pandas dataframe
JSON nested list to Pandas dataframe


I have a json file which looks like this:

    "Aveiro": {
        "Albergaria-a-Velha": {
            "candidates": [
                    "effectiveCandidates": [
                        "JOSÉ OLIVEIRA SANTOS"
                    "party": "B.E.",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "B.E.",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 1.34,
                        "presidents": 0,
                        "validVotesPercentage": 1.4,
                        "votes": 179
                    "effectiveCandidates": [
                    "party": "CDS-PP",
                    "votes": {
                        "absoluteMajority": 1,
                        "acronym": "CDS-PP",
                        "constituenctyCounter": 1,
                        "mandates": 5,
                        "percentage": 59.7,
                        "presidents": 1,
                        "validVotesPercentage": 62.5,
                        "votes": 7970
                    "effectiveCandidates": [
                    "party": "CH",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "CH",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 1.87,
                        "presidents": 0,
                        "validVotesPercentage": 1.95,
                        "votes": 249
                    "effectiveCandidates": [
                    "party": "PCP-PEV",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "PCP-PEV",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 1.57,
                        "presidents": 0,
                        "validVotesPercentage": 1.65,
                        "votes": 210
                    "effectiveCandidates": [
                        "DELFINA LISBOA MARTINS DA CUNHA"
                    "party": "PPD/PSD",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "PPD/PSD",
                        "constituenctyCounter": 1,
                        "mandates": 2,
                        "percentage": 24.23,
                        "presidents": 0,
                        "validVotesPercentage": 25.37,
                        "votes": 3235
                    "effectiveCandidates": [
                        "JESUS MANUEL VIDINHA TOMÁS"
                    "party": "PS",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "PS",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 6.82,
                        "presidents": 0,
                        "validVotesPercentage": 7.14,
                        "votes": 910
            "parentTerritoryName": "Aveiro",
            "territoryKey": "LOCAL-010200",
            "territoryName": "Albergaria-a-Velha",
            "total_votes": {
                "availableMandates": 0,
                "blankVotes": 377,
                "blankVotesPercentage": 2.82,
                "displayMessage": null,
                "hasNoVoting": false,
                "nullVotes": 221,
                "nullVotesPercentage": 1.66,
                "numberParishes": 6,
                "numberVoters": 13351,
                "percentageVoters": 59.48

The full file is here for reference

I thought that this code would work

import pandas as pd 
from pandas import json_normalize
import json

with open('autarquicas_2021.json') as f:
    data = json.load(f)

df = pd.json_normalize(data)

However this is returning the following:

Aveiro.Albergaria-a-Velha.candidates  ... Évora.Évora.total_votes.percentageVoters
0  [{'effectiveCandidates': ['JOSÉ OLIVEIRA SANTO...  ...                                    49.84

[1 rows x 4312 columns]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Columns: 4312 entries, Aveiro.Albergaria-a-Velha.candidates to Évora.Évora.total_votes.percentageVoters
dtypes: bool(308), float64(924), int64(1540), object(1540)
memory usage: 31.7  KB

For some reason the code is not working, and my research has led me to no solutions, as it seems that every json file has a mind of its own.

Any help would be much appreciated. Thank you in advance!

Disclaimer: This is for an open source project to bring more transparency into local elections in Portugal. It will not be used for commercial, or for profit projects.

CodePudding user response:

You can use json_normalize with a little transformation of original JSON format.

  1. Convert JSON into list format. I am assuming "Aveiro" as city, and "Albergaria-a-Velha" as district. Apologies of my unfamiliarity of the area, so if it is wrong, please rename the key.
res = [{**z, **{'city': x, 'district': y}} for x, y in data.items() for y, z in y.items()]

This will transform original JSON of key-values style into list of objects.

    "city": "Aveiro",
    "district": "Albergaria-a-Velha",
    "candidates": [{
  1. Then use json_normalize.
df = pd.json_normalize(res, record_path=['candidates'], meta=['total_votes', 'city', 'district'])
  1. Further expanding the nested object total_votes.
df = pd.concat([df, pd.json_normalize(df['total_votes'])], axis=1)
>>> df.iloc[0]
effectiveCandidates                                      [JOSÉ OLIVEIRA SANTOS]
party                                                                      B.E.
votes.absoluteMajority                                                        0
votes.acronym                                                              B.E.
votes.constituenctyCounter                                                    1
votes.mandates                                                                0
votes.percentage                                                           1.34
votes.presidents                                                              0
votes.validVotesPercentage                                                  1.4
votes.votes                                                                 179
total_votes                   {'availableMandates': 0, 'blankVotes': 377, 'b...
city                                                                     Aveiro
district                                                     Albergaria-a-Velha
availableMandates                                                             0
blankVotes                                                                  377
blankVotesPercentage                                                       2.82
displayMessage                                                             None
hasNoVoting                                                               False
nullVotes                                                                   221
nullVotesPercentage                                                        1.66
numberParishes                                                                6
numberVoters                                                              13351
percentageVoters                                                          59.48
Name: 0, dtype: object

CodePudding user response:

Recursive Approach:

I usually use this function (a recursive approach) to do that kind of thing:

# Function for flattening 
# json
def flatten_json(y):
    out = {}
    def flatten(x, name =''):
        # If the Nested key-value 
        # pair is of dict type
        if type(x) is dict:
            for a in x:
                flatten(x[a], name   a   '_')
        # If the Nested key-value
        # pair is of list type
        elif type(x) is list:
            i = 0
            for a in x:                
                flatten(a, name   str(i)   '_')
                i  = 1
            out[name[:-1]] = x
    return out

You can call flatten_json for flattening your nested json.

# Driver code

Library-based approach:

from flatten_json import flatten

unflat_json = {'user' :
                'Email': '[email protected]', 
                'friends': ['Johnny', 'Mark', 'Tom']
flat_json = flatten(unflat_json)
  • Related