Home > Software design >  How do I get a header with a python web scraper?
How do I get a header with a python web scraper?

Time:10-30

so I am having an issue with a web scraper I am making for a website I'm developing. The main issue I am having is when trying to get a header for a product that is in an h1 format, it keeps responding with this: <h1 >CHERRY MX SILENT RED(10pcs)</h1> I just want the Cherry Mx Silent Red part and not all of the other stuff. Here is the code for my web scraper:

from bs4 import BeautifulSoup

URL = 'https://kbdfans.com/collections/cherry-switches/products/cherry-mx-silent-red'

headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36'}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

title = soup.find('h1', {'class' : 'product-detail__title small-title'})

print(title)

CodePudding user response:

Try this :

title.get_text()

Your title is not a string, it's an object

From BeautifulSoup documentation:

The find_all() method looks through a tag’s descendants and retrieves all descendants that match your filters.

For your title variable, you may refer to bs4.element.Tag documentation, and if you have a doubt you can always print all available methods of that object like this:

print(dir(title))

CodePudding user response:

Getting the text from your <h1> just use .text and .get_text() when you need to pass custom arguments to strip whitespaces,... or add an seperator (e.g. title.get_text(strip=True, seperator=',')).

print(title.text)

or

print(title.get_text())

Example

from bs4 import BeautifulSoup

URL = 'https://kbdfans.com/collections/cherry-switches/products/cherry-mx-silent-red'

headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36'}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

title = soup.find('h1', {'class' : 'product-detail__title small-title'})

print(title.text)

Output

CHERRY MX SILENT RED(10pcs)
  • Related