I have this HTML snippet
<div >
<div ><span data-user-id="568352418596587458" title="discorduser#1234">Discord User</span> <span ><a href="#chatlog__message-container-854963254185698547">16-Jan-22 12:33 PM</a></span></div>
<div ><a href="imageurl here"> <img alt="Image attachment" loading="lazy" src="imageurl here" title="Image: image title.jpg (2.12 MB)"/> </a></div>
</div>
and when i run this python code:
from bs4 import BeautifulSoup, Tag
html = open("test.html", encoding='utf-8', buffering=100000).read()
soup = BeautifulSoup(html, 'lxml')
allMessages = soup.find_all('div', class_="chatlog__message-primary")
discordId = soup.find('span', {'data-user-id':'568352418596587458'})
for message in allMessages:
if discordId in message:
print (message)
it does not return anything but i can do
for message in allMessages:
print (discordId)
and it returns the span with all elements, I cant get it to filter or
for div in soup.find_all('div', class_='chatlog__attachment'):
print (div.a['href'])
but then i lose the ability to filter based off data-user-id
CodePudding user response:
You could not use in
in this case to check, if your element is available, it should look more like:
for message in allMessages:
if message.find('span', {'data-user-id':'568352418596587458'}):
print (message)
print (message.a['href'])
An alternativ would be to use:
for e in soup.select('div.chatlog__message-primary:has([data-user-id="568352418596587458"])'):
print (e.a['href'])
Example
from bs4 import BeautifulSoup
html = '''
<div >
<div ><span data-user-id="568352418596587458" title="discorduser#1234">Discord User</span> <span ><a href="#chatlog__message-container-854963254185698547">16-Jan-22 12:33 PM</a></span></div>
<div ><a href="imageurl here"> <img alt="Image attachment" loading="lazy" src="imageurl here" title="Image: image title.jpg (2.12 MB)"/> </a></div>
</div>
'''
soup = BeautifulSoup(html)
allMessages = soup.find_all('div', class_="chatlog__message-primary")
for e in soup.select('div.chatlog__message-primary:has([data-user-id="568352418596587458"])'):
print (e.a['href'])
Output
#chatlog__message-container-854963254185698547