I'm new on stack overflow, I'm writing a script in python and I've got a doubt that I can resolve, I need to create a variable with the price of the product, by now I've collected decimal price in €, thanks to web scraping.
import bs4, requests
link = "https://hookpod.shop/products/hookpod-screw-adapter"
response = requests.get(link)
response.raise_for_status()
soup = bs4.BeatifulSoup(response.text, 'html.parster')
span_price = soup.find('span', class_='product__price')
what output gives to me is:
<span class="product__price" data-product-price=""> €10.00 </span>
I need to get the amount (€10.00) and transform it in a int
, is there anybody who can help me with, I really need it
CodePudding user response:
converting span_price text to int will solve it.
something like:
var int_span_price = int(span_price.text.replace('€', ''))
CodePudding user response:
The find method return a Tag object and you can access to its string via the text
attribute. Then you should remove the empty space around it with strip
, and the money-symbol, with a slice for example. The cast to float
and finally with int
.
from bs4 import BeautifulSoup
html = '<span class="product__price" data-product-price=""> €10.00 </span>'
span_price = BeautifulSoup(html,'lxml') # you can change parser
span_price_value = int(float(span_price.text.strip()[1:]))
print(span_price_value)
Remark:
- I used another parser bit make no difference just be sure to change it if you haven't install it (
lxml
) - if don't use
strip
then you should be careful with the slice of the string, not more at 1
CodePudding user response:
I recommend you to use https://pypi.org/project/price-parser/
To install it run pip install price-parser
>>> from price_parser import Price
>>> price = Price.fromstring("22,90 €")
>>> price
Price(amount=Decimal('22.90'), currency='€')
>>> price.amount # numeric price amount
Decimal('22.90')
>>> price.currency # currency symbol, as appears in the string
'€'
>>> price.amount_text # price amount, as appears in the string
'22,90'
>>> price.amount_float # price amount as float, not Decimal
22.9
CodePudding user response:
use Beautiful Soup's tag system to lock on that data and soup.getText() to pull it out. You could also parse the results of the soup.find method you did there
CodePudding user response:
There was a couple of typos so I am writing the full code. Use regex to get the digits out of the Euro prices you got already.
import bs4, requests
from bs4 import BeautifulSoup
link = "https://hookpod.shop/products/hookpod-screw-adapter"
response = requests.get(link)
response.raise_for_status()
soup = bs4.BeautifulSoup(response.text, 'html.parser')
span_price = soup.find('span', class_='product__price')
import re
result = re.search(r'\d ', span_price.text)
result_int = int(result.group())
result_int