Hi I am trying to get the first url of a google search based on queries in a list. For the sake of simplicity I am going to use the same code as a similar question 2 years prior.
from googlesearch import search
list_of_queries = ["Geeksforgeeks", "stackoverflow", "GitHub"]
results = []
for query in list_of_queries:
results.append(search(query, tld="co.in", num=1, stop=1, pause=2))
print (results)
Now this returns a list of generator objects. A solution was found to print out the list of results by adding
for result in results:
print (list(results))
However I want the results list to be in the form of a list of strings in order to web scrape the urls for data. One solution I found was to add
results_str = []
for result in results:
results_str.append(list(result))
When I print results_str I get this as an output:
[['https://www.geeksforgeeks.org/'], ['https://stackoverflow.com/'], ['https://github.com/']]
As one can see I cannot even use results_str directly as a list of urls to webscrape because of the additional brackets around each url. I thought I could work around it by removing the brackets by following this answer and thus adding
results_str_new = [s.replace('[' and ']', '') for s in results_str]
But this simply results in an AttributeError
AttributeError: 'list' object has no attribute 'replace'
Either way even if I did get it to work it all seems unnecessarily unnecessary to do all this work just to convert a list of generator objects to strings to use as urls to webscrape so I was wondering if there were any alternatives. I know that one of my options is to use selenium but I don't really want to do that because I don't want the hassle of an instance of Chrome opening whenever I run my script.
Thanks in advance
CodePudding user response:
You are getting back a list of lists of string. To change that, you can use a list comprehension like this
results_str = [url for result in results for url in result]
or you can change from append
to extend
if you don't want to go with a list comprehension. Extend just extends the list where es append inserts the lists into the list.
results_str = []
for result in results:
results_str.extend(result)
CodePudding user response:
Looks like you may be using a different version of googlesearch. I'm using googlesearch-python 1.1.0 so the call parameters are different. However, this should help:
from googlesearch import search
list_of_queries = ["Geeksforgeeks", "stackoverflow", "GitHub"]
results = []
for query in list_of_queries:
results.extend([r for r in search(query, 1, 'en')])
print(results)
Output:
['https://www.youtube.com/c/GeeksforGeeksVideos/videos', 'https://stackoverflow.com/', 'https://stackoverflow.blog/', 'https://github.com/']
Which, as you can see, is a simple list of strings (URLs in this case)