Home > OS >  Python: How to skip JSON object if library object doesn't exist
Python: How to skip JSON object if library object doesn't exist

Time:10-01

image of video output Hey, so I ran into an annoying problem while scraping YouTube channel video data using the YT Data API v3. This was working about a week ago but now it wants to be stubborn.

Everything is working up to this point in the code, EXCEPT that for some strange reason the JSON results that it is returning have the third result as a channel object entry instead of a video object. I can't find any reason for why this is happening now, but it is problematic because that means that the following code for extracting video IDs stops iterating after the first two results because the third (channel) entry does not have the 'id' dictionary entry 'videoId', as it has 'channelId' instead.

I think my best solution is to modify this code block so it skips any JSON object that doesn't have 'videoId' in its 'id' dictionary, but I'm a little rusty on how to do this with dictionary objects in the for-loop. Can anyone help me out?

limit = 5 
video_Ids = []
nextPageToken ="" #for 0th iteration let it be null
for i in range(limit):
    url = f"https://www.googleapis.com/youtube/v3/search?key={api_key}&part=snippet&channelId={channel_Id}&maxResults=50&pageToken={nextPageToken}"
    data = json.loads(requests.get(url).text)
    for item in data['items']: 
        video_Id = item['id']['videoId']
        video_Ids.append(video_Id)           
    nextPageToken = data['nextPageToken']

EDIT: I have included an image of the JSON output. As you can see, it has several nested dictionaries and lists.

CodePudding user response:

limit = 5 
video_Ids = []
nextPageToken ="" #for 0th iteration let it be null
for i in range(limit):
    url = f"https://www.googleapis.com/youtube/v3/search?key={api_key}&part=snippet&channelId={channel_Id}&maxResults=50&pageToken={nextPageToken}"
    data = json.loads(requests.get(url).text)
    for item in data['items']: 
        if 'videoId' in item['id']:
           video_Id = item['id']['videoId']
           video_Ids.append(video_Id)           
    nextPageToken = data['nextPageToken']

or you can use try-exception

limit = 5 
video_Ids = []
nextPageToken ="" #for 0th iteration let it be null
for i in range(limit):
    url = f"https://www.googleapis.com/youtube/v3/search?key={api_key}&part=snippet&channelId={channel_Id}&maxResults=50&pageToken={nextPageToken}"
    data = json.loads(requests.get(url).text)
    for item in data['items']: 
        try:
           video_Id = item['id']['videoId']
           video_Ids.append(video_Id)           
        except KeyError:
           pass  
    nextPageToken = data['nextPageToken']

CodePudding user response:

What about using try/except in your code?

So if there is an error, the code will not exit and just print an error message and go to the following video.

video_Ids = []
nextPageToken ="" #for 0th iteration let it be null
for i in range(limit):
    url = f"https://www.googleapis.com/youtube/v3/search?key={api_key}&part=snippet&channelId={channel_Id}&maxResults=50&pageToken={nextPageToken}"
    data = json.loads(requests.get(url).text)
    try:
        for item in data['items']: 
            video_Id = item['id']['videoId']
            video_Ids.append(video_Id)           
    except:
        print(f'There was an error loading the page {i}')
    nextPageToken = data['nextPageToken']

CodePudding user response:

Thanks for responding, guys, but I just found a solution:

limit = 5 
video_Ids = []
nextPageToken ="" #for 0th iteration let it be null
for i in range(limit):
    url = f"https://www.googleapis.com/youtube/v3/search?key={api_key}&part=snippet&channelId={channel_Id}&maxResults=50&pageToken={nextPageToken}"
    data = json.loads(requests.get(url).text)
    for key in data['items']:
        if 'videoId' in key['id']:
            video_Id = key['id']['videoId']
            video_Ids.append(video_Id)
        # else:
        #     continue

    nextPageToken = data['nextPageToken']

So what this does is skip the channel JSON object and return 249/250 results (as video JSON objects in videoId code format). It's not perfect but it will work for me.

  • Related