I have a scraper that I've been using for like a year without issue. What I do is I find a specific element by searching for it's text. Now usually when I do so with find(text="Cost of gas per GJ")
it would return the whole tag, now it only returns the text.
rate_2_response = requests.get("https://www.fortisbc.com/accounts-billing/billing-rates/natural-gas-rates/business-rates#tab-0")
# print(rate_2_response.text)
rate_2_soup = BeautifulSoup(rate_2_response.text, "html.parser")
rate_2_table = rate_2_soup.find("body")
print(rate_2_table)
cost_of_gas_gj = rate_2_table.find(text="Cost of gas per GJ") ## problematic line
print(cost_of_gas_gj)
rate_2_table
returns a whole long list of elements (which contains what I need), so there's no problem there. But find()
seems to not parse it correctly.
I need cost_of_gas_gj
to return <td width="70%">Cost of gas per GJ</td>
not just the inner text. The website has not changed.
CodePudding user response:
Try this:
import requests
from bs4 import BeautifulSoup
url = "https://www.fortisbc.com/accounts-billing/billing-rates/natural-gas-rates/business-rates#tab-0"
rate_2_soup = BeautifulSoup(requests.get(url).text, "html.parser")
cost_of_gas_gj = rate_2_soup.find("td", text="Cost of gas per GJ")
print(cost_of_gas_gj)
Output:
<td width="70%">Cost of gas per GJ</td>
CodePudding user response:
If you don't know, what tag has specified text, use .parent
:
...
cost_of_gas_gj = rate_2_table.find(text="Cost of gas per GJ").parent
# Output: <td width="70%">Cost of gas per GJ</td>
print(cost_of_gas_gj)
...