I have a XML file of the following type
<timeseries><mrid>1</mrid><businesstype>A96</businesstype><flowdirection.direction>A02</flowdirection.direction><quantity_measure_unit.name>MAW</quantity_measure_unit.name>
and I use the following code and it shows output {xml_nodeset (0)}
. but when I try to search for "businesstype" then I have the results from the node. I think the problem is in "." inside the node name. How can be the problem solved using the same function as in my code?
library(rvest)
xml %>% html_elements("flowdirection.direction") %>% html_text()
xml %>% html_nodes("flowdirection.direction")
CodePudding user response:
The default selector type for html_elements
(and html_nodes()
) is a CSS selector. In CSS selectors the .
has a special meaning to indicate the HTML "class" of a node. If you want to use CSS selectors, you have to escape the .
with a backslash. For example
xml <- xml2::read_xml("<timeseries><mrid>1</mrid><businesstype>A96</businesstype><flowdirection.direction>A02</flowdirection.direction>
xml %>%
html_elements("flowdirection\\.direction")
or you could use xpath
selectors
xml %>%
html_elements(xpath="flowdirection.direction")
This might make more sense because you seem to be working with raw XML data rather than HTML nodes.