Adding lists from nested JSON file to pandas dataframe?-CodePudding

I have a Json file that looks like the following. I want to grab the strings of names in the "actors" list and add them to a dataframe (which is empty now, the first item added to the dataframe would be the strings of actor names as rows).


{
    "1": {
        "title": "Exodus: Gods and Kings",
        "url": "https://en.wikipedia.org/wiki/Exodus: Gods and Kings",
        "year": "2014",
        "poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
        "actors": [
            "Christian Bale",
            "Joel Edgerton",
            "John Turturro",
            "Aaron Paul",
            "Ben Mendelsohn",
            "Sigourney Weaver",
            "Ben Kingsley"
        ]
    },
...

I have tried using the following python code to do this but I am unsuccesful, I beleive because I am using a function wrong or not using the right function at all. Any suggestions as to what function/method to use?

# Create dataframe from json file
df_json = pd.read_json("movies_metadata.json", encoding='latin-1')

# Create new dataframe with actor names
data = [df.iloc[4]]
df = pd.DataFrame(data)

I strongly beleive that my code is very poor, but have had a hard time finding how to do this when googling.

Tried googling all around, as well as different methods from pandas to add items to dataframes

CodePudding user response：

You can use list-comprehension to get actors from the dictionary and then construct a dataframe. For example:

data = {
    "1": {
        "title": "Exodus: Gods and Kings",
        "url": "https://en.wikipedia.org/wiki/Exodus: Gods and Kings",
        "year": "2014",
        "poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
        "actors": [
            "Christian Bale",
            "Joel Edgerton",
            "John Turturro",
            "Aaron Paul",
            "Ben Mendelsohn",
            "Sigourney Weaver",
            "Ben Kingsley",
        ],
    }
}

df = pd.DataFrame(
    [actor for v in data.values() for actor in v["actors"]], columns=["Actors"]
)
print(df)

Prints:

             Actors
0    Christian Bale
1     Joel Edgerton
2     John Turturro
3        Aaron Paul
4    Ben Mendelsohn
5  Sigourney Weaver
6      Ben Kingsley

CodePudding user response：

# read in the json file
df =pd.read_json('txt.json')


#if you have multiple json records, each will be its own columns
# filter the actor rows and then explode 
df.loc['actors',:].explode()

1       Christian Bale
1        Joel Edgerton
1        John Turturro
1           Aaron Paul
1       Ben Mendelsohn
1     Sigourney Weaver
1         Ben Kingsley
2      2Christian Bale
2       2Joel Edgerton
2       2John Turturro
2          2Aaron Paul
2      2Ben Mendelsohn
2    2Sigourney Weaver
2        2Ben Kingsley
Name: actors, dtype: object

PS: I expanded your JSON file to hae two records in it