Home > Net >  How do I get a normal list with strings instead of generator objects when I perform a googlesearch
How do I get a normal list with strings instead of generator objects when I perform a googlesearch

Time:02-26

Hi I am trying to get the first url of a google search based on queries in a list. For the sake of simplicity I am going to use the same code as a similar question 2 years prior.

from googlesearch import search

list_of_queries = ["Geeksforgeeks", "stackoverflow", "GitHub"]
results = []

for query in list_of_queries:
    results.append(search(query, tld="co.in", num=1, stop=1, pause=2))

print (results)

Now this returns a list of generator objects. A solution was found to print out the list of results by adding

for result in results:
    print (list(results))

However I want the results list to be in the form of a list of strings in order to web scrape the urls for data. One solution I found was to add

results_str = []
for result in results:
    results_str.append(list(result))

When I print results_str I get this as an output:

[['https://www.geeksforgeeks.org/'], ['https://stackoverflow.com/'], ['https://github.com/']]

As one can see I cannot even use results_str directly as a list of urls to webscrape because of the additional brackets around each url. I thought I could work around it by removing the brackets by following this answer and thus adding

results_str_new = [s.replace('[' and ']', '') for s in results_str]

But this simply results in an AttributeError

AttributeError: 'list' object has no attribute 'replace'

Either way even if I did get it to work it all seems unnecessarily unnecessary to do all this work just to convert a list of generator objects to strings to use as urls to webscrape so I was wondering if there were any alternatives. I know that one of my options is to use selenium but I don't really want to do that because I don't want the hassle of an instance of Chrome opening whenever I run my script.

Thanks in advance

CodePudding user response:

You are getting back a list of lists of string. To change that, you can use a list comprehension like this

results_str = [url for result in results for url in result]

or you can change from append to extend if you don't want to go with a list comprehension. Extend just extends the list where es append inserts the lists into the list.

results_str = []
for result in results:
    results_str.extend(result)

CodePudding user response:

Looks like you may be using a different version of googlesearch. I'm using googlesearch-python 1.1.0 so the call parameters are different. However, this should help:

from googlesearch import search

list_of_queries = ["Geeksforgeeks", "stackoverflow", "GitHub"]
results = []

for query in list_of_queries:
    results.extend([r for r in search(query, 1, 'en')])

print(results)

Output:

['https://www.youtube.com/c/GeeksforGeeksVideos/videos', 'https://stackoverflow.com/', 'https://stackoverflow.blog/', 'https://github.com/']

Which, as you can see, is a simple list of strings (URLs in this case)

  • Related