Home > Back-end >  Beautifulsoup: find_all returns empty list and needs an exact matching string
Beautifulsoup: find_all returns empty list and needs an exact matching string

Time:07-22

I'm trying to learn webscraping in python. I followed Tim's tutorial until I got to a problem on minute 13: [Youtube] (https://www.youtube.com/watch?v=gRLHr664tXA&ab_channel=TechWithTim)

You can copy-paste the code to check out the problem. I used a small merch website since Amazon blocked access.

from bs4 import BeautifulSoup
import requests


#Add amazon site to access prices
URL = "https://do-something.de/products/just-do-something-t-shirt"

#get HTML file from URL
Web_Data = requests.get(URL)

#Parse (make correct syntax for the)document
HTML_File = BeautifulSoup(Web_Data.text, "html.parser")  

#Make a variable for finding all requested strings

###PROBLEM: I want to find the substring, not the exact string, 
# which is in this case "25,00€". I only want it to check the "€" sign
prices = BeautifulSoup.find_all(HTML_File, text='€') 

#Print file
print(prices)    

CodePudding user response:

I found this question, relatively close.

Here's a working example for your context:

from bs4 import BeautifulSoup
import re

text = """
<body>
    <div >10€</div>
    <div >1€</div>
    <div >2.4 €</div>
</body>"""

soup = BeautifulSoup(text)
result = soup.findAll(text=re.compile(r"\d (,\d )? ?€"))
print(result)

I hope it helps, the difference is quite small, I just replaced your
'€'
by
re.compile(r"\d (,\d )? ?€")

CodePudding user response:

One solution can be using custom lambda function that checks if strings ends with :

import requests
from bs4 import BeautifulSoup


URL = "https://do-something.de/products/just-do-something-t-shirt"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")

product_meta = soup.find(class_="ProductMeta")

prices = product_meta.find_all(text=lambda t: t.endswith("€"))
print(prices)

Prints:

['25,00€']

CodePudding user response:

from bs4 import BeautifulSoup
import requests
import re

#Add amazon site to access prices
URL = "https://do-something.de/products/just-do-something-t-shirt"

#get HTML file from URL
Web_Data = requests.get(URL)

#Parse (make correct syntax for the)document
HTML_File = BeautifulSoup(Web_Data.text, "html.parser")  

#Make a variable for finding all requested strings

###PROBLEM: I want to find the substring, not the exact string, 
# which is in this case "25,00€". I only want it to check the "€" sign
# prices = BeautifulSoup.find_all(HTML_File, text='€') ## replace with 
prices = HTML_File.find_all(string= re.compile("€"))
for price in prices:
    print(price.text.strip())
    print('___________')

THis returns:

___________

___________
100,00€
___________

___________
25,00€
___________
Kostenloser Versand ab 100€ Bestellwert
___________
XXS - 25,00€
___________
XS - 25,00€
___________
S - 25,00€
___________
M - 25,00€
___________
L - 25,00€
___________
XL - 25,00€
___________
XXL - 25,00€
___________
XXXL - 25,00€
___________
Standardversand: 4,90€
___________
Kostenloser Versand ab einem Bestellwert von 100€
___________
13,00€
___________
16,50€
___________
12,00€
___________
16,00€
___________
14,00€
___________
18,00€
___________
  • Related