Home > database >  Create Dataframe from deeply nested json
Create Dataframe from deeply nested json

Time:12-04

I am trying to read below json schema to dataframe, i can convert it to my preferred type by iterating over all nodes but it can take a while because original json files is much longer then this example. (in tens of thousands)

    "data": [
        {
            "node": {
                "id": 2,
                "title": "Berserk",
                "main_picture": {
                    "medium": "https:\/\/api-cdn.myanimelist.net\/images\/manga\/1\/157897.jpg",
                    "large": "https:\/\/api-cdn.myanimelist.net\/images\/manga\/1\/157897l.jpg"
                },
                "alternative_titles": {
                    "synonyms": [
                        "Berserk: The Prototype"
                    ],
                    "en": "Berserk",
                    "ja": "\u30d9\u30eb\u30bb\u30eb\u30af"
                },
                "start_date": "1989-08-25",
                "end_date": "2021-09-10",
                "synopsis": "Guts, a former mercenary now known as the \"Black Swordsman,\" is out for revenge. After a tumultuous childhood, he finally finds someone he respects and believes he can trust, only to have everything fall apart when this person takes away everything important to Guts for the purpose of fulfilling his own desires. Now marked for death, Guts becomes condemned to a fate in which he is relentlessly pursued by demonic beings.\n\nSetting out on a dreadful quest riddled with misfortune, Guts, armed with a massive sword and monstrous strength, will let nothing stop him, not even death itself, until he is finally able to take the head of the one who stripped him\u2014and his loved one\u2014of their humanity.\n\n[Written by MAL Rewrite]\n\nIncluded one-shot:\nVolume 14: Berserk: The Prototype",
                "mean": 9.42,
                "rank": 1,
                "popularity": 2,
                "num_list_users": 468103,
                "num_scoring_users": 223269,
                "nsfw": "white",
                "created_at": "1970-01-01T00:00:00 00:00",
                "updated_at": "2021-12-03T19:02:32 00:00",
                "media_type": "manga",
                "status": "on_hiatus",
                "genres": [
                    {
                        "id": 1,
                        "name": "Action"
                    },
                    {
                        "id": 2,
                        "name": "Adventure"
                    },
                    {
                        "id": 10,
                        "name": "Fantasy"
                    },
                    {
                        "id": 40,
                        "name": "Psychological"
                    },
                    {
                        "id": 37,
                        "name": "Supernatural"
                    },
                    {
                        "id": 38,
                        "name": "Military"
                    },
                    {
                        "id": 6,
                        "name": "Demons"
                    },
                    {
                        "id": 8,
                        "name": "Drama"
                    },
                    {
                        "id": 14,
                        "name": "Horror"
                    },
                    {
                        "id": 41,
                        "name": "Seinen"
                    }
                ],
                "my_list_status": {
                    "status": "completed",
                    "is_rereading": false,
                    "num_volumes_read": 0,
                    "num_chapters_read": 0,
                    "score": 9,
                    "updated_at": "2020-05-29T15:13:01 00:00"
                },
                "num_volumes": 41,
                "num_chapters": 380,
                "authors": [
                    {
                        "node": {
                            "id": 1868,
                            "first_name": "Kentarou",
                            "last_name": "Miura"
                        },
                        "role": "Story & Art"
                    }
                ]
            },
            "ranking": {
                "rank": 1
            }
        },
        {
            "node": {
                "id": 1706,
                "title": "JoJo no Kimyou na Bouken Part 7: Steel Ball Run",
                "main_picture": {
                    "medium": "https:\/\/api-cdn.myanimelist.net\/images\/manga\/3\/179882.jpg",
                    "large": "https:\/\/api-cdn.myanimelist.net\/images\/manga\/3\/179882l.jpg"
                },
                "alternative_titles": {
                    "synonyms": [
                        "JoJo's Bizarre Adventure Part 7: Steel Ball Run",
                        "SBR"
                    ],
                    "en": "",
                    "ja": "\u30b8\u30e7\u30b8\u30e7\u306e\u5947\u5999\u306a\u5192\u967a Part7 STEEL BALL RUN"
                },
                "start_date": "2004-01-19",
                "end_date": "2011-04-19",
                "synopsis": "In the American Old West, the world's greatest race is about to begin. Thousands line up in San Diego to travel over six thousand kilometers for a chance to win the grand prize of fifty million dollars. With the era of the horse reaching its end, contestants are allowed to use any kind of vehicle they wish. Competitors will have to endure grueling conditions, traveling up to a hundred kilometers a day through uncharted wastelands. The Steel Ball Run is truly a one-of-a-kind event.\n\nThe youthful Johnny Joestar, a crippled former horse racer, has come to San Diego to watch the start of the race. There he encounters Gyro Zeppeli, a racer with two steel balls at his waist instead of a gun. Johnny witnesses Gyro using one of his steel balls to unleash a fantastical power, compelling a man to fire his gun at himself during a duel. In the midst of the action, Johnny happens to touch the steel ball and feels a power surging through his legs, allowing him to stand up for the first time in two years. Vowing to find the secret of the steel balls, Johnny decides to compete in the race, and so begins his bizarre adventure across America on the Steel Ball Run.\n\n[Written by MAL Rewrite]",
                "mean": 9.25,
                "rank": 2,
                "popularity": 27,
                "num_list_users": 178230,
                "num_scoring_users": 105411,
                "nsfw": "white",
                "created_at": "1970-01-01T00:00:00 00:00",
                "updated_at": "2021-12-03T18:56:19 00:00",
                "media_type": "manga",
                "status": "finished",
                "genres": [
                    {
                        "id": 1,
                        "name": "Action"
                    },
                    {
                        "id": 2,
                        "name": "Adventure"
                    },
                    {
                        "id": 37,
                        "name": "Supernatural"
                    },
                    {
                        "id": 13,
                        "name": "Historical"
                    },
                    {
                        "id": 7,
                        "name": "Mystery"
                    },
                    {
                        "id": 14,
                        "name": "Horror"
                    },
                    {
                        "id": 27,
                        "name": "Shounen"
                    },
                    {
                        "id": 41,
                        "name": "Seinen"
                    }
                ],
                "num_volumes": 24,
                "num_chapters": 96,
                "authors": [
                    {
                        "node": {
                            "id": 2619,
                            "first_name": "Hirohiko",
                            "last_name": "Araki"
                        },
                        "role": "Story & Art"
                    }
                ]
            },
            "ranking": {
                "rank": 2
            }
        }
    ],
    "paging": {
        "next": "https:\/\/api.myanimelist.net\/v2\/manga\/ranking?offset=2&ranking_type=manga&limit=2&fields=id,title,main_picture,alternative_titles,start_date,end_date,synopsis,mean,rank,popularity,num_list_users,num_scoring_users,nsfw,created_at,updated_at,media_type,status,genres,my_list_status,num_volumes,num_chapters,authors{first_name,last_name},pictures,background,related_anime,related_manga,recommendations"
    }
}

Columns that gives me struggle are genres and authors. I only want the name field in the genres column, and the first_name, last_name columns in the authors column.

The format that i want for both are

   genres                             first_name    last_name            
0  Action, Adventure, Fantasy,        Kentarou      Miura
   Psychological, Supernatural ...

1  Action, Adventure, Supernatural,   Hirohiko      Araki
   Historical, Mystery, Horror ...

The code i currently have is this

import pandas as pd
import json
with open('./response.json') as file:
    data = json.load(file)

df = pd.json_normalize(data['data'])

This code gives genres and authors columns in this format

                                         node.genres                                       node.authors
0  [{'id': 1, 'name': 'Action'}, {'id': 2, 'name'...  [{'node': {'id': 1868, 'first_name': 'Kentarou...
1  [{'id': 1, 'name': 'Action'}, {'id': 2, 'name'...  [{'node': {'id': 2619, 'first_name': 'Hirohiko...
2  [{'id': 1, 'name': 'Action'}, {'id': 2, 'name'...  [{'node': {'id': 1881, 'first_name': 'Eiichiro...
3  [{'id': 1, 'name': 'Action'}, {'id': 2, 'name'...  [{'node': {'id': 1911, 'first_name': 'Takehiko...
4  [{'id': 8, 'name': 'Drama'}, {'id': 7, 'name':...  [{'node': {'id': 1867, 'first_name': 'Naoki', ...

I want to know that if i can use trick that can give me my preferred type without iterating over all rows.

CodePudding user response:

Construct the individual DataFrames of authors and genres and join to the original df:

authors = pd.json_normalize(data["data"], record_path=["node",["authors"]])[["node.first_name", "node.last_name"]]
genres = pd.Series([", ".join(g["name"] for g in d) for d in df["node.genres"].tolist()]).rename("genres")
output = df.join(authors.join(genres))

CodePudding user response:

Alternatively, you can do either of the following. In limited testing, using apply() over str methods is fastest.

Option 1

df['genres'] = df.apply(lambda x: ', '.join([d['name'] for d in x['node.genres']]), axis=1)
df['first_name'] = df['node.authors'].str[0].str['node'].str['first_name']
df['last_name'] = df['node.authors'].str[0].str['node'].str['last_name']

Option 2

df['genres'] = df.apply(lambda x: ', '.join([d['name'] for d in x['node.genres']]), axis=1)
df['first_name'] = df.apply(lambda x: x['node.authors'][0]['node']['first_name'], axis=1)
df['last_name'] = df.apply(lambda x: x['node.authors'][0]['node']['last_name'], axis=1)

Modified Option 2 (fastest)

df['genres'] = df.apply(lambda x: ', '.join([d['name'] for d in x['node.genres']]), axis=1)
df [['first_name', 'last_name']] = df.apply(lambda x: (x['node.authors'][0]['node']['first_name'], x['node.authors'][0]['node']['last_name']), axis=1)

Timeit results:

answer 1 from not_speshal: 
1.94 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

My options in order:
1.45 ms ± 45.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.35 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
925 µs ± 6.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  • Related