I have a Json file that looks like the following. I want to grab the strings of names in the "actors" list and add them to a dataframe (which is empty now, the first item added to the dataframe would be the strings of actor names as rows).
{
"1": {
"title": "Exodus: Gods and Kings",
"url": "https://en.wikipedia.org/wiki/Exodus: Gods and Kings",
"year": "2014",
"poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
"actors": [
"Christian Bale",
"Joel Edgerton",
"John Turturro",
"Aaron Paul",
"Ben Mendelsohn",
"Sigourney Weaver",
"Ben Kingsley"
]
},
...
I have tried using the following python code to do this but I am unsuccesful, I beleive because I am using a function wrong or not using the right function at all. Any suggestions as to what function/method to use?
# Create dataframe from json file
df_json = pd.read_json("movies_metadata.json", encoding='latin-1')
# Create new dataframe with actor names
data = [df.iloc[4]]
df = pd.DataFrame(data)
I strongly beleive that my code is very poor, but have had a hard time finding how to do this when googling.
Tried googling all around, as well as different methods from pandas to add items to dataframes
CodePudding user response:
You can use list-comprehension to get actors from the dictionary and then construct a dataframe. For example:
data = {
"1": {
"title": "Exodus: Gods and Kings",
"url": "https://en.wikipedia.org/wiki/Exodus: Gods and Kings",
"year": "2014",
"poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
"actors": [
"Christian Bale",
"Joel Edgerton",
"John Turturro",
"Aaron Paul",
"Ben Mendelsohn",
"Sigourney Weaver",
"Ben Kingsley",
],
}
}
df = pd.DataFrame(
[actor for v in data.values() for actor in v["actors"]], columns=["Actors"]
)
print(df)
Prints:
Actors
0 Christian Bale
1 Joel Edgerton
2 John Turturro
3 Aaron Paul
4 Ben Mendelsohn
5 Sigourney Weaver
6 Ben Kingsley
CodePudding user response:
# read in the json file
df =pd.read_json('txt.json')
#if you have multiple json records, each will be its own columns
# filter the actor rows and then explode
df.loc['actors',:].explode()
1 Christian Bale
1 Joel Edgerton
1 John Turturro
1 Aaron Paul
1 Ben Mendelsohn
1 Sigourney Weaver
1 Ben Kingsley
2 2Christian Bale
2 2Joel Edgerton
2 2John Turturro
2 2Aaron Paul
2 2Ben Mendelsohn
2 2Sigourney Weaver
2 2Ben Kingsley
Name: actors, dtype: object
PS: I expanded your JSON file to hae two records in it