My movie data has movie scripts from different script websites and basic data from IMDb website. Here, I am trying to get the first "file_name" under "files and "id" from "imdb" for each movie.
This is the first movie from my data:
{
"10thingsihateaboutyou": {
"files": [
{
"name": "10 Things I Hate About You",
"source": "imsdb",
"file_name": "10-Things-I-Hate-About-You",
"script_url": "https://imsdb.com/scripts/10-Things-I-Hate-About-You.html",
"size": 215724
},
{
"name": "10 Things I Hate About You",
"source": "screenplays",
"file_name": "10-Things-I-Hate-About-You",
"script_url": "https://www.screenplays-online.de/screenplay.php/119",
"size": 130951
},
"imdb": {
"title": "10 Things I Hate About You",
"release_date": 1999,
"id": "0147800"
}
}
I keep getting the following error with my code below.
file_name = data[movie]["files"]["file_name"]
TypeError: list indices must be integers or slices, not str
import json
with open('clean_meta.json') as json_file:
data = json.load(json_file)
script_files = []
id_list = []
for movie in data:
file_name = data[movie]["files"]["file_name"]
i_d = data[movie]["imdb"]["id"]
scripts_files.append(file_name)
id_list.append(i_d)
close('clean_meta.json')
CodePudding user response:
data[movie]["files"]
is a list, not a dictionary.
You'll need to loop over this list to get further information.
for movie in data:
for file in data[movie]["files"]
file_name = file["file_name"]
script_files.append(file_name)
i_d = data[movie]["imdb"]["id"]
Also, your file is already closed since you used with
, so you don't need to call any close function
CodePudding user response:
{
"10thingsihateaboutyou": {
"files": [
{
"name": "10 Things I Hate About You",
"source": "imsdb",
"file_name": "10-Things-I-Hate-About-You",
"script_url": "https://imsdb.com/scripts/10-Things-I-Hate-About-You.html",
"size": 215724
},
{
"name": "10 Things I Hate About You",
"source": "screenplays",
"file_name": "10-Things-I-Hate-About-You",
"script_url": "https://www.screenplays-online.de/screenplay.php/119",
"size": 130951
}],
"imdb": {
"title": "10 Things I Hate About You",
"release_date": 1999,
"id": "0147800"
}
}
You forgot to close the square bracket for files
which will inevitably throw an error.
Anyway it's a bit tricky to work with such a dictionary:
data["10thingsihateaboutyou"] is a dict
data["10thingsihateaboutyou"]["files"] is a list
data["10thingsihateaboutyou"]["files"][0] is a dict
So here you have to treat data["10thingsihateaboutyou"]["files"]
as a list but you treated it as a dictionary. To access a list you can only use an integer like that:
print(data["10thingsihateaboutyou"]["files"][0]) # access first element of "files"
Output:
{'name': '10 Things I Hate About You',
'source': 'imsdb',
'file_name': '10-Things-I-Hate-About-You',
'script_url': 'https://imsdb.com/scripts/10-Things-I-Hate-About-You.html',
'size': 215724} # Note that it returns a dict
Or a slice like that:
print(data["10thingsihateaboutyou"]["files"][:]) # Access all elements in "files"
Output:
[{'name': '10 Things I Hate About You',
'source': 'imsdb',
'file_name': '10-Things-I-Hate-About-You',
'script_url': 'https://imsdb.com/scripts/10-Things-I-Hate-About-You.html',
'size': 215724},
{'name': '10 Things I Hate About You',
'source': 'screenplays',
'file_name': '10-Things-I-Hate-About-You',
'script_url': 'https://www.screenplays-online.de/screenplay.php/119',
'size': 130951}] # Note that it returns a list of dictionaries
Beware a dictionary can also be accessed with an integer but only if the key of a key/value pair is an integer like so:
my_dict = {
1:{"name":"John", "lastname":"Doe"},
2:{"name":"Jane", "lastname":"Doe"}
}
print(my_dict[2])
Output:
{'name': 'Jane', 'lastname': 'Doe'}
Here I accessed the key named 2. In a list it would have been:
my_list = [["John", "Doe"],["Jane", "Doe"]]
print(my_list[1])
Output:
['Jane', 'Doe']
Don't forget list indexes start from 0 which means that the first item in a list is at index 0.
Also you didn't specify why you wanted the first file_name
so I took the liberty of selecting file_name
only if source
is from "imsdb", here's how I'd do this:
for movie in data:
file_name = data[movie]["files"]
for entry in file_name:
if entry["source"] == "imsdb":
script_files.append(entry["file_name"])
id_list.append(data[movie]["imdb"]["id"])