Home > Mobile >  How to get text from html attributes
How to get text from html attributes

Time:06-09

I tried to parse a page to get some element as text, but I cant find how to get text from select

For exmaple, html below has data-initial-rating="4" and title="Members who rated this thread">12 Votes", but I cant get it

<select name="rating"  data-xf-init="rating" data-initial-rating="4" data-rating-href="/threads/isis-the-fall-v1-02-tjord.117157/br-rate" data-readonly="false" data-deselectable="false" data-show-selected="true" data-widget- data-vote-content="<div data-href=&quot;/threads/game-mod-v-1-02/br-user-rated&quot; data-xf-click=&quot;overlay&quot; data-xf-init=&quot;tooltip&quot; title=&quot;Members who rated this thread&quot;>12 Votes</div>" style="display: none;">
                <option value="">&nbsp;</option>
<option value="1">Terrible</option>
<option value="2">Poor</option>
<option value="3">Average</option>
<option value="4">Good</option>
<option value="5">Excellent</option>

            </select>

what i tried

import requests
import lxml.html


response = requests.get('somewebsite.com')
tree = lxml.html.fromstring(response.text)
# full xptah
messy_rating_and_votes = tree.xpath('/html/body/div[2]/div/div[3]/div/div[1]/div/div/div[3]/div/div[2]/div/div/select')
print(messy_rating_and_votes) # its just print empty list, so i cant use .text or .text_content()

So, i guese thats i select wrong or use wrong method, but almost 2 hours of googling dosent help me

CodePudding user response:

This example uses BeautifulSoup4

import requests
from bs4 import BeautifulSoup

response = requests.get("somewebsite.com")
soup = BeautifulSoup(response.content, 'html5lib')  # requires pip install html5lib

for option in soup.find_all('option'):
    print(f"value: {option['value']} text: {option.text}")

CodePudding user response:

We can't judge the correctness of your XPath because you didn't include the full document. You might have made a small error anywhere in that path, e.g. div[3] when it should have been div[2], for instance.

You might try a simpler path using the descendant axis (with syntactic shortcut //) instead of the default child axis. This would enable you to skip over much the document's messy structure. e.g.

//select[@name='rating']

or

//select[@name='rating'][@data-xf-init='rating']

... or however specific you need to be to identify that particular select element.

  • Related