Home > Back-end >  html_element() in rvest: matching element by font size
html_element() in rvest: matching element by font size

Time:06-22

MWE:

html <- minimal_html('
    <p id="name1"><font size=5>Here is size 5 font </font></p>
    <p id="name2" ><font size=3>And here is size 3 font </font></p>
   ')

html %>% html_elements('#name1')
html %>% html_elements('.second') 
html %>% html_elements('font')
html %>% html_elements('#5')
html %>% html_elements('.5')

My goal is to extract all elements with attribute "size=5". I know the easy way to do this when the attribute is "id" or "class" (as shown above) but I can't find any way to do it for the attribute "size". (I tried with both html_elements and html_nodes.) Is there a way to do this in the rvest package?

CodePudding user response:

Not sure how to do this with the CSS selectors if that's required, but here's some XPath that does the trick:

html %>% html_elements(xpath = '//font[@size=5]')

Output:

{xml_nodeset (1)}
[1] <font size="5">Here is size 5 font </font>

Or, for truly all elements with a size attribute of 5 (not just fonts):

html %>% html_elements(xpath = '//*[@size=5]')

CodePudding user response:

The css selector lists for font size 5 and for size 5

html_elements('font[size="5"]')

In the above font is the type (tag) selector and [size="5"] is the attribute = value selector.

And

html_elements('[size="5"]')

In the above, the absence of the type selector in the selector list means matching will now be on the size attribute and its associated value only.

  • Related