To start, python is my first language I am learning.
I am scraping a website for rent prices across my city and I am using BeautifulSoup
to get the price data, but I am unable to get the value of this
tag.
Here is the tag:
<p><strong >Monthly Rent: </strong>2,450 </p>
Here is my code:
text = soup.find_all("div", {"class", "plan-group rent"})
for item in text:
rent = item.find_all("p")
for price in rent:
print(price.string)
I also tried:
text = soup.find_all("div", {"class", "plan-group rent"})
for item in text:
rent = item.find_all("p")
for price in rent:
items = price.find_all("strong")
for item in items:
print('item.string')
and that works to print out "Monthly Rent:" but I don't understand why I can't get the actual price. The above code shows me that the monthly rent is in the strong tag, which means that the p tag only contains the price which is what I want.
CodePudding user response:
As mentioned by @kyrony there are two children in your <p>
- Cause you select the <strong>
you will only get on of the texts.
You could use different approaches stripped_strings
:
list(soup.p.stripped_strings)[-1]
or contents
soup.p.contents[-1]
or with recursive
argument
soup.p.find(text=True, recursive=False)
Example
from bs4 import BeautifulSoup
html = '''<p><strong >Monthly Rent: </strong>2,450 </p>'''
soup = BeautifulSoup(html)
soup.p.contents[-1]
CodePudding user response:
Technically your content has two children
<p><strong >Monthly Rent: </strong>2,450 </p>
A strong tag
<strong >Monthly Rent: </strong>
and a string
2,450
The string method in beautiful soup only takes one argument so its going to return None. In order to get the second string you need to use the stripped_strings generator.