I build a code that extracting data from YouTube by search query, and now I need to convert my output data into the pandas data frame, so later I will be able to export this as .csv. But now I stuck one the issue that my pf.DataFrame actually return me only first row of parsed data instead of full massive. Please help!
Example: I want pandas give me back same row number as the maxResults im searching for Now: pandas give me back only first line info from parsed data no matter how much data was found
Scraping code:
api_key = "***"
from googleapiclient.discovery import build
from pprint import PrettyPrinter
from google.colab import files
youtube = build('youtube','v3',developerKey = api_key)
print(type(youtube))
pp = PrettyPrinter()
nextPageToken = ''
for x in range(1):
#while True:
request = youtube.search().list(
q='star wars',
part='id,snippet',
maxResults=3,
order="viewCount",
pageToken=nextPageToken,
type='video')
print(type(request))
res = request.execute()
pp.pprint(res)
if 'nextPageToken' in res:
nextPageToken = res['nextPageToken']
# else:
# break
ids = [item['id']['videoId'] for item in res['items']]
results = youtube.videos().list(id=ids, part='snippet').execute()
for result in results.get('items', []):
print(result ['id'])
print(result ['snippet']['channelTitle'])
print(result ['snippet']['title'])
print(result ['snippet']['description'])
Pandas Code:
data = {'Channel Title': [result['snippet']['channelTitle']],
'Title': [result['snippet']['title']],
'Description': [result['snippet']['description']]
}
df = pd.DataFrame(data,
columns = ['Channel Title', 'Title', 'Description'],
)
#df3 = pd.concat([df], ignore_index = True)
#df3.reset_index()
df.head()
#print(df3)
CodePudding user response:
IIUC~
This:
data = {'Channel Title': [result['snippet']['channelTitle']],
'Title': [result['snippet']['title']],
'Description': [result['snippet']['description']]
}
Should be:
data = {'Channel Title': [result['snippet']['channelTitle'] for result in results['items']],
'Title': [result['snippet']['title'] for result in results['items']],
'Description': [result['snippet']['description'] for result in results['items']]
}
Otherwise you're just using result
from the last iteration of your for-loop....