Home > Software engineering >  How to fix number of returned rows in Pandas?
How to fix number of returned rows in Pandas?

Time:08-29

I build a code that extracting data from YouTube by search query, and now I need to convert my output data into the pandas data frame, so later I will be able to export this as .csv. But now I stuck one the issue that my pf.DataFrame actually return me only first row of parsed data instead of full massive. Please help!

Example: I want pandas give me back same row number as the maxResults im searching for Now: pandas give me back only first line info from parsed data no matter how much data was found

Scraping code:

api_key = "***"

from googleapiclient.discovery import build
from pprint import PrettyPrinter
from google.colab import files

youtube = build('youtube','v3',developerKey = api_key)

print(type(youtube))
pp = PrettyPrinter()
nextPageToken = ''

for x in range(1):
#while True:
    request = youtube.search().list(
        q='star wars',
        part='id,snippet',
        maxResults=3,
        order="viewCount",
        pageToken=nextPageToken,
        type='video')
    
    print(type(request))
    res = request.execute()
    pp.pprint(res) 
    
    if 'nextPageToken' in res:
        nextPageToken = res['nextPageToken']

#    else:
#        break

ids = [item['id']['videoId'] for item in res['items']]
results = youtube.videos().list(id=ids, part='snippet').execute()
for result in results.get('items', []):
    print(result ['id'])
    print(result ['snippet']['channelTitle'])
    print(result ['snippet']['title'])
    print(result ['snippet']['description'])    

Pandas Code:

data = {'Channel Title': [result['snippet']['channelTitle']],
        'Title': [result['snippet']['title']],
        'Description': [result['snippet']['description']]
       }


df = pd.DataFrame(data,
                  columns = ['Channel Title', 'Title', 'Description'],
                 )
    
#df3 = pd.concat([df], ignore_index = True)
#df3.reset_index()


df.head()
#print(df3)

CodePudding user response:

IIUC~

This:

data = {'Channel Title': [result['snippet']['channelTitle']],
        'Title': [result['snippet']['title']],
        'Description': [result['snippet']['description']]
       }

Should be:

data = {'Channel Title': [result['snippet']['channelTitle'] for result in results['items']],
        'Title': [result['snippet']['title'] for result in results['items']],
        'Description': [result['snippet']['description'] for result in results['items']]
       }

Otherwise you're just using result from the last iteration of your for-loop....

  • Related