HTML code:
span data-testid="post_timestamp" data-click-id="timestamp" style="color: rgb(129, 131, 132);">8 months ago</span
I want to extract "8 months ago". I code I am using is not giving any result.
data.find_all('span', attrs={'data-testid': True,'data-click-id' : True,'color':True})
CodePudding user response:
There are different approaches - Select element by an exact attributes value:
soup.select_one('[data-testid="post_timestamp"]').text
by a containing text ago
if it is lways available:
soup.select_one('span:-soup-contains("ago")').text
by color in style attribute:
soup.select_one('span[style*="color"]').text
Example
from bs4 import BeautifulSoup
html='''
<span data-testid="post_timestamp" data-click-id="timestamp" style="color: rgb(129, 131, 132);">8 months ago</span>
'''
soup = BeautifulSoup(html)
soup.select_one('[data-testid="post_timestamp"]').text
#soup.select_one('span:-soup-contains("ago")').text
#soup.select_one('span[style*="color"]').text
CodePudding user response:
Currently, you are only saying give me an element that has these attributes, which might give you more elements than you'd like. But you could also say give me an element that has an attribute with this value.
The data-testid="post_timestamp"
attribute could come in handy to identify the span, as it seems that it identifies the timestamp of the post.
data.find_all('span', attrs={'data-testid': 'post_timestamp'})
Now you are telling BeautifulSoup to return a <span>
with data-testid="post_timestamp"
, which should match the element you want.
If you want to find a single element instead of multiple, you can also use data.find
instead of data.find_all
. data.find
basically returns you only the first element of data.find_all
.
You can get the text of the element by accessing the text
property of an element.
timestamp_element = data.find("span", {"data-testid": "post_timestamp"})
text = timestamp_element.text