Home > Back-end >  Adding lists from nested JSON file to pandas dataframe?
Adding lists from nested JSON file to pandas dataframe?

Time:11-08

I have a Json file that looks like the following. I want to grab the strings of names in the "actors" list and add them to a dataframe (which is empty now, the first item added to the dataframe would be the strings of actor names as rows).


{
    "1": {
        "title": "Exodus: Gods and Kings",
        "url": "https://en.wikipedia.org/wiki/Exodus: Gods and Kings",
        "year": "2014",
        "poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
        "actors": [
            "Christian Bale",
            "Joel Edgerton",
            "John Turturro",
            "Aaron Paul",
            "Ben Mendelsohn",
            "Sigourney Weaver",
            "Ben Kingsley"
        ]
    },
...

I have tried using the following python code to do this but I am unsuccesful, I beleive because I am using a function wrong or not using the right function at all. Any suggestions as to what function/method to use?

# Create dataframe from json file
df_json = pd.read_json("movies_metadata.json", encoding='latin-1')

# Create new dataframe with actor names
data = [df.iloc[4]]
df = pd.DataFrame(data)

I strongly beleive that my code is very poor, but have had a hard time finding how to do this when googling.

Tried googling all around, as well as different methods from pandas to add items to dataframes

CodePudding user response:

You can use list-comprehension to get actors from the dictionary and then construct a dataframe. For example:

data = {
    "1": {
        "title": "Exodus: Gods and Kings",
        "url": "https://en.wikipedia.org/wiki/Exodus: Gods and Kings",
        "year": "2014",
        "poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
        "actors": [
            "Christian Bale",
            "Joel Edgerton",
            "John Turturro",
            "Aaron Paul",
            "Ben Mendelsohn",
            "Sigourney Weaver",
            "Ben Kingsley",
        ],
    }
}

df = pd.DataFrame(
    [actor for v in data.values() for actor in v["actors"]], columns=["Actors"]
)
print(df)

Prints:

             Actors
0    Christian Bale
1     Joel Edgerton
2     John Turturro
3        Aaron Paul
4    Ben Mendelsohn
5  Sigourney Weaver
6      Ben Kingsley

CodePudding user response:

# read in the json file
df =pd.read_json('txt.json')


#if you have multiple json records, each will be its own columns
# filter the actor rows and then explode 
df.loc['actors',:].explode()

1       Christian Bale
1        Joel Edgerton
1        John Turturro
1           Aaron Paul
1       Ben Mendelsohn
1     Sigourney Weaver
1         Ben Kingsley
2      2Christian Bale
2       2Joel Edgerton
2       2John Turturro
2          2Aaron Paul
2      2Ben Mendelsohn
2    2Sigourney Weaver
2        2Ben Kingsley
Name: actors, dtype: object

PS: I expanded your JSON file to hae two records in it

  • Related