Home > Back-end >  Two of three scripts based on bs4 not work, but theoretically should
Two of three scripts based on bs4 not work, but theoretically should

Time:08-29

I prepared three little scripts, that theoretically the should do the same, but two not work properly. I'm not sure what could be wrong. I used PyCharm, and packages was installed inside projects, not globally with PIP.

First script don't give me any results, just "Process finished with exit code 0".

import requests
import bs4

text = "Python"
url = 'https://google.com/search?q='   text
request_result = requests.get(url)

soup = bs4.BeautifulSoup(request_result.text, "html.parser")
heading_object = soup.find_all('h3')

for info in heading_object:
    print(info.getText())

Second script same as above, only "Process finished with exit code 0".

import requests
import bs4
from urllib.parse import quote_plus

result = 'Python'
query = quote_plus(result)
link = f"https://www.google.com/search?q={query}"

request_result = requests.get(link)
soup = bs4.BeautifulSoup(request_result.text, "html.parser")

for p in soup.find_all('h3'):
    print(p.text)

Third script work fine, I have result from Google search.

import requests
import bs4


url = "https://www.google.com/search"
params = {"q": "Python"}
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}
soup = bs4.BeautifulSoup(requests.get(url, params=params, headers=headers).content, "html.parser")

for a in soup.select("a:has(h3)"):
    print(a["href"])

Can someone explain me please, what is not ok with scripts, that not worked? I asking, because theoretically they should work (they based on tutorial). Maybe exist better way than above to scraping Google results?

CodePudding user response:

I feel like stating the obvious, but the main difference between your scripts is specifying a browser's header. For instance, your first script with headers:

import requests
import bs4

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}
text = "Python"
url = 'https://google.com/search?q='   text
request_result = requests.get(url, headers=headers)

soup = bs4.BeautifulSoup(request_result.text, "html.parser")
heading_object = soup.find_all('h3')

for info in heading_object:
    print(info.getText())

Results:

Welcome to Python.org
Downloads
Python For Beginners
[...]

Headers are how the browser present itself when knocking on server's door: server can choose to accept, or deny the request.

CodePudding user response:

it wont work because heading_object is an empty list. There are basically no h3 found.

so i changed to h2 and then h1 to show it works:

heading_object = soup.find_all('h1')

this is the code:

import requests
import bs4

text = "Python"
url = 'https://google.com/search?q='   text
request_result = requests.get(url)

soup = bs4.BeautifulSoup(request_result.text, "html.parser")
heading_object = soup.find_all('h1')
print(heading_object)

for info in heading_object:
    print(info.getText())

this is the result (with the code):

[<h1>Before you continue to Google</h1>]
Before you continue to Google
  • Related