MWE:
html <- minimal_html('
<p id="name1"><font size=5>Here is size 5 font </font></p>
<p id="name2" ><font size=3>And here is size 3 font </font></p>
')
html %>% html_elements('#name1')
html %>% html_elements('.second')
html %>% html_elements('font')
html %>% html_elements('#5')
html %>% html_elements('.5')
My goal is to extract all elements with attribute "size=5". I know the easy way to do this when the attribute is "id" or "class" (as shown above) but I can't find any way to do it for the attribute "size". (I tried with both html_elements and html_nodes.) Is there a way to do this in the rvest package?
CodePudding user response:
Not sure how to do this with the CSS selectors if that's required, but here's some XPath that does the trick:
html %>% html_elements(xpath = '//font[@size=5]')
Output:
{xml_nodeset (1)}
[1] <font size="5">Here is size 5 font </font>
Or, for truly all elements with a size attribute of 5 (not just fonts):
html %>% html_elements(xpath = '//*[@size=5]')
CodePudding user response:
The css selector lists for font size 5 and for size 5
html_elements('font[size="5"]')
In the above font is the type (tag) selector and [size="5"] is the attribute = value selector.
And
html_elements('[size="5"]')
In the above, the absence of the type selector in the selector list means matching will now be on the size attribute and its associated value only.