I'm trying to learn webscraping in python. I followed Tim's tutorial until I got to a problem on minute 13: [Youtube] (https://www.youtube.com/watch?v=gRLHr664tXA&ab_channel=TechWithTim)
You can copy-paste the code to check out the problem. I used a small merch website since Amazon blocked access.
from bs4 import BeautifulSoup
import requests
#Add amazon site to access prices
URL = "https://do-something.de/products/just-do-something-t-shirt"
#get HTML file from URL
Web_Data = requests.get(URL)
#Parse (make correct syntax for the)document
HTML_File = BeautifulSoup(Web_Data.text, "html.parser")
#Make a variable for finding all requested strings
###PROBLEM: I want to find the substring, not the exact string,
# which is in this case "25,00€". I only want it to check the "€" sign
prices = BeautifulSoup.find_all(HTML_File, text='€')
#Print file
print(prices)
CodePudding user response:
I found this question, relatively close.
Here's a working example for your context:
from bs4 import BeautifulSoup
import re
text = """
<body>
<div >10€</div>
<div >1€</div>
<div >2.4 €</div>
</body>"""
soup = BeautifulSoup(text)
result = soup.findAll(text=re.compile(r"\d (,\d )? ?€"))
print(result)
I hope it helps, the difference is quite small, I just replaced your
'€'
by
re.compile(r"\d (,\d )? ?€")
CodePudding user response:
One solution can be using custom lambda
function that checks if strings ends with €
:
import requests
from bs4 import BeautifulSoup
URL = "https://do-something.de/products/just-do-something-t-shirt"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
product_meta = soup.find(class_="ProductMeta")
prices = product_meta.find_all(text=lambda t: t.endswith("€"))
print(prices)
Prints:
['25,00€']
CodePudding user response:
from bs4 import BeautifulSoup
import requests
import re
#Add amazon site to access prices
URL = "https://do-something.de/products/just-do-something-t-shirt"
#get HTML file from URL
Web_Data = requests.get(URL)
#Parse (make correct syntax for the)document
HTML_File = BeautifulSoup(Web_Data.text, "html.parser")
#Make a variable for finding all requested strings
###PROBLEM: I want to find the substring, not the exact string,
# which is in this case "25,00€". I only want it to check the "€" sign
# prices = BeautifulSoup.find_all(HTML_File, text='€') ## replace with
prices = HTML_File.find_all(string= re.compile("€"))
for price in prices:
print(price.text.strip())
print('___________')
THis returns:
___________
___________
100,00€
___________
___________
25,00€
___________
Kostenloser Versand ab 100€ Bestellwert
___________
XXS - 25,00€
___________
XS - 25,00€
___________
S - 25,00€
___________
M - 25,00€
___________
L - 25,00€
___________
XL - 25,00€
___________
XXL - 25,00€
___________
XXXL - 25,00€
___________
Standardversand: 4,90€
___________
Kostenloser Versand ab einem Bestellwert von 100€
___________
13,00€
___________
16,50€
___________
12,00€
___________
16,00€
___________
14,00€
___________
18,00€
___________