I am trying to scrape mortgage rates from https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/
When I use find_all to get value from a cell in specific table, the returned value is "!--empty--" instead of the text within that cell.
The actual html for that cell is:
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF036C" high-ratio="false" resl-rate="" type="S">2.54%</span>
The result which is returned is:
<span class="h2" code="a.reslrates.MTGF036C" high-ratio="false" resl-rate="" type="S"><!--empty--></span>
Instead of the 2.54% rate text, I get !--empty-- result. I get Am I missing something here? Full code below:
html_text = requests.get("https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/").text
soup = BeautifulSoup(html_text, "html.parser")
# Get the table
table = soup.find("div", class_="td-rates-table rates-bg-row1").table
rows = table.tbody.find_all("tr")
for row in rows:
for rate in row.find_all("td"):
print(rate)
I appreciate all responses! Thanks a lot!
CodePudding user response:
Using selenium. Please install necessay dependencies and execute the script.
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
url = 'https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/'
# html_text = requests.get("https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/").text
driver.get(url)
soup = BeautifulSoup(driver.page_source, "html.parser")
# Get the table
table = soup.find("div", class_="td-rates-table rates-bg-row1").table
rows = table.tbody.find_all("tr")
for row in rows:
for rate in row.find_all("td"):
print(rate.text)
CodePudding user response:
@Omer, You code is working fine but the problem is that the page is dynamic. If you make disabled javascript then you will notice that the page goes disappeared. That's why I use selenium with bs4 and now it's working fine.
Code:
from bs4 import BeautifulSoup
import time
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
url = "https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/"
driver.get(url)
time.sleep(8)
soup = BeautifulSoup(driver.page_source, 'html.parser')
# Get the table
table = soup.find("div", class_="td-rates-table rates-bg-row1").table
rows = table.tbody.find_all("tr")
for row in rows:
for rate in row.find_all("td"):
print(rate)
Output:
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF036C" high-ratio="false" resl-rate="" type="S">2.54%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF036C" high-ratio="false" resl-rate="" type="A">2.574%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF060C" high-ratio="false" resl-rate="" type="S">2.74%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF060C" high-ratio="false" resl-rate="" type="A">2.761%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF060C" high-ratio="true" resl-rate="" type="S">2.64%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF060C" high-ratio="true" resl-rate="" type="A">2.661%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGV060C" high-ratio="false" resl-rate="" type="S">1.55%</span>
</div>
</td>
<td>
<div class="rte">
<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGV060C" high-ratio="false" resl-rate="" type="A">1.571%</span>
</div>
</td>