I am trying to open the top 5 search results in google. But my code is not opening the top results. Instead, it is opening 5 tabs with google, google web results, google images, google news, and google books. My code is below,
import requests, sys, webbrowser, bs4
res = requests.get('https://google.com/search?q=' ' '.join(sys.argv[1:]))
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
linkElems = soup.select(r'a')
numOpen = min(5, len(linkElems))
for i in range(numOpen):
webbrowser.open('https://google.com' linkElems[i].get('href'))
Please help. I want the code to open the top five search results and not images or books.
CodePudding user response:
As mentioned select your elements more specific but try to avoid using dynamic class names, instead try css selectors
:
soup.select('a:has(>h3)')
Example
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get('https://google.com/search?q=test',headers = {'User-Agent': 'Mozilla/5.0'}, cookies={'CONSENT':'YES '}).text)
soup.select('a:has(>h3)')
CodePudding user response:
You should try to find some standard class, id, or some other attribute in the result page to filter results by it, and then, when you make sure the results are what you wanted, you can get the top five results.
Finding a standard attribute needs a little bit of search on the result page. It seems that the class that appeared in the below screenshot will do it but you need to make sure at least there is no use of this HTML class name before the search results on the page.
Also, I think there must be some kind of limitation on the google search page, and google strongly advise to not crawl its normal search but to use the provided APIs. I think it's good to consider this option too.