I have been trying to learn something about API so I did some calls to YT API. I have the following code:
# Import Libraries
import requests
import pandas as pd
import time
# KEY
API_KEY = 'MY KEY'
CHANNEL_ID = 'SOME YT CHANNEL'
def get_video_details(video_id):
# Segundo call al API
url_video_stats = 'https://www.googleapis.com/youtube/v3/videos?id=' video_id '&part=statistics&key=' API_KEY
response_video_stats = requests.get(url_video_stats).json()
# Recoleccion de views, likes, dislikes y comentarios de cada video
view_count = response_video_stats['items'][0]['statistics']['viewCount']
like_count = response_video_stats['items'][0]['statistics']['likeCount']
comment_count = response_video_stats['items'][0]['statistics']['commentCount']
return view_count, like_count, comment_count
def get_videos(df):
# Make API call
pageToken = ''
while 1: # while 1 is the same as while true. It means loop forever. The only way to stop the loop is to use a break statement.
url = 'https://www.googleapis.com/youtube/v3/search?key=' API_KEY '&channelId=' CHANNEL_ID '&part=snippet,id&order=date&maxResults=50&' pageToken
response = requests.get(url).json()
time.sleep(2) # Para evitar que se quede en el loop infinito, esperamos 1 segundo antes de hacer el siguiente llamado.
# Crear el loop para extraer todos los datos de los videos
for video in response['items']:
if video['id']['kind'] == 'youtube#video': # Para asegurarme que el objeto sea un video y no otra cosa
video_id = video['id']['videoId']
video_title = video['snippet']['title']
video_title = str(video_title).replace('&', '')
upload_date = video['snippet']['publishedAt']
upload_date = str(upload_date).split('T')[0]
view_count, like_count, comment_count = get_video_details(video_id)
'''# Guardando los datos en el DataFrame 'df' vacio que creamos antes
df = df.append({
'video_id' : video_id,
'video_title' : video_title,
'upload_date' : upload_date,
'view_count' : view_count,
'like_count' : like_count,
#'dislike_count' : dislike_count,
'comment_count' : comment_count
},
ignore_index=True
)'''
# Guardamos los datos en el df usando pd.concat:
df = pd.concat([df, pd.DataFrame({
'video_id' : video_id,
'video_title' : video_title,
'upload_date' : upload_date,
'view_count' : view_count,
'like_count' : like_count,
'comment_count' : comment_count
},
index=[0]
)])
try:
if response['nextPageToken'] != None: # Si hay una pagina siguiente, entonces seguimos con el loop
pageToken = "pageToken=" response['nextPageToken']
except:
break
return df
# Main df
df = pd.DataFrame(columns=['video_id', 'video_title', 'upload_date', 'view_count', 'like_count', 'comment_count'])
df = get_videos(df)
And if I try, for example, this channel: UCaY_-ksFSQtTGk0y1HA_3YQ I only get 322 videos in the DF, nevertheless the channel has 1000 videos
{'kind': 'youtube#searchListResponse',
'etag': 'JkC3s6SSNCamNNEIDoC_IcYw9dY',
'nextPageToken': 'CDIQAA',
'regionCode': 'AR',
'pageInfo': {'totalResults': 1063, 'resultsPerPage': 50},
'items': [{ ... }]}
I was doing some calls from others channels with less videos and I get all of them, but if the channel has, I don't know, 500 videos, the built functions don't work very well like you can see...
Any idea? What am I doing wrong?
Thanks all!
CodePudding user response:
Note: Search results are constrained to a maximum of 500 videos if your request specifies a value for the channelId parameter and sets the type parameter value to video
Source: Search: list#channelId
AFAIK Search: list
endpoint is bugged and always limits its results number to 500.
To consume 100 times less quota, in addition to having a working algorithm consider listing uploads
playlist videos (the playlist id is the channel id with a UU
as a prefix instead of UC
) with PlaylistItems: list.
See here for more details.
CodePudding user response:
I checked the channel you mentioned (i.e. UCaY_-ksFSQtTGk0y1HA_3YQ
) and I found:
- Its uploaded videos - (i.e uploads playlist) contains 727 videos.
- The total videos in all its playlists (excluding the uploads playlist) contains in total: 625 videos1.
Checking your code, my suggestions are:
- Use playlist:list endpoint for get all the playlist a given channel has.
- Use playlistitems:list endpoint for get all videos inside a playlist - unsing the id from the playlist:list response.
- Use again the playlistitems:list endpoint, but, this time, passing the uploads playlist id of the channel. You need to change the value from UCaY_-ksFSQtTGk0y1HA_3YQ` to UUaY_-ksFSQtTGk0y1HA_3YQ.
The sum of both videos found on all their playlists and their uploaded videos are 1352 videos1 - which I assume is the number you're referring to in your question:
Quote:
And if I try, for example, this channel: UCaY_-ksFSQtTGk0y1HA_3YQ I only get 322 videos in the DF, nevertheless the channel has 1000 videos
1 Videos up to date (16/08/2022). This number might change in the future = due to new uploads/changes in their playlists, etc.