Home > Back-end >  Python BS4 find_all replaces text inside the tag with <!--empty-->
Python BS4 find_all replaces text inside the tag with <!--empty-->

Time:11-06

I am trying to scrape mortgage rates from https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/

When I use find_all to get value from a cell in specific table, the returned value is "!--empty--" instead of the text within that cell.

The actual html for that cell is:

<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF036C" high-ratio="false" resl-rate="" type="S">2.54%</span>

The result which is returned is:

<span class="h2" code="a.reslrates.MTGF036C" high-ratio="false" resl-rate="" type="S"><!--empty--></span>

Instead of the 2.54% rate text, I get !--empty-- result. I get Am I missing something here? Full code below:

html_text = requests.get("https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/").text

soup = BeautifulSoup(html_text, "html.parser")

# Get the table
table = soup.find("div", class_="td-rates-table rates-bg-row1").table

rows = table.tbody.find_all("tr")

for row in rows:
  for rate in row.find_all("td"):
      print(rate)

I appreciate all responses! Thanks a lot!

CodePudding user response:

Using selenium. Please install necessay dependencies and execute the script.

from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

url = 'https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/'
# html_text = requests.get("https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/").text
driver.get(url)
soup = BeautifulSoup(driver.page_source, "html.parser")

# Get the table
table = soup.find("div", class_="td-rates-table rates-bg-row1").table

rows = table.tbody.find_all("tr")

for row in rows:
  for rate in row.find_all("td"):
      print(rate.text)

CodePudding user response:

@Omer, You code is working fine but the problem is that the page is dynamic. If you make disabled javascript then you will notice that the page goes disappeared. That's why I use selenium with bs4 and now it's working fine.

Code:

from bs4 import BeautifulSoup
import time
from selenium import webdriver


driver = webdriver.Chrome('chromedriver.exe')
url = "https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/"
driver.get(url)
time.sleep(8)

soup = BeautifulSoup(driver.page_source, 'html.parser')
# Get the table
table = soup.find("div", class_="td-rates-table rates-bg-row1").table

rows = table.tbody.find_all("tr")

for row in rows:
  for rate in row.find_all("td"):
      print(rate)

Output:

<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF036C" high-ratio="false" resl-rate="" type="S">2.54%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF036C" high-ratio="false" resl-rate="" type="A">2.574%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF060C" high-ratio="false" resl-rate="" type="S">2.74%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF060C" high-ratio="false" resl-rate="" type="A">2.761%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF060C" high-ratio="true" resl-rate="" type="S">2.64%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF060C" high-ratio="true" resl-rate="" type="A">2.661%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGV060C" high-ratio="false" resl-rate="" type="S">1.55%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGV060C" high-ratio="false" resl-rate="" type="A">1.571%</span>
</div>
</td>
  • Related