Home > Enterprise >  How to parse USER_DEFINED XML data with R
How to parse USER_DEFINED XML data with R

Time:11-30

I have an XML file with USER_DEFINED parameters that I'm trying to parse out. Here is an example of the XML document.

         <userDefinedParameters>
           <USER_DEFINED parameter="P1">LEFT</USER_DEFINED>
           <USER_DEFINED parameter="P2">RIGHT</USER_DEFINED>
           <USER_DEFINED parameter="P3">1234</USER_DEFINED>
           <USER_DEFINED parameter="P4">5678</USER_DEFINED>
         </userDefinedParameters>
       </data>
     </segment>
   </body>
</head>

I am able to parse out all data from this file using the XML package and xpathApply. However, I can't pull out the USER_DEFINED parameter values this way.

Since there are several records in the XML, I'd like to get all P1s, P2s, etc., as I get the other fields using xpathApply. The document states all USER_DEFINED parameters are as 'parameter' and 'value' so I think I need to pull as c('paramater', 'value') but I don't know how to do this using XML.

I have looked at this SO page, it helped a lot, but doesn't answer this question.

Thanks for any/all help.

UPDATED for desired output and how I'm trying to get the data. Note, the below code doesn't work as desired.

Current xpathApply usage gets all USER_DEFINED rows within the userDefinedParameters section. If I change to xpathApply(data, "//USER_DEFINED"), xmlValue) then I get all values but no relation to the parameter name. I need something like xpathApply(data, "//USER_DEFINED/P1"), xmlValue) but, obviously, this doesn't work.

Library(XML)
fileName <- "./file.xml"
data     <- xmlParse(fileName)
xml_data <- xmlToList(data)
p1 <- xpathApply(data, "//USER_DEFINED")
p2 <- xpathApply(data, "//USER_DEFINED")

# View(p1)
#     "P1"
#     LEFT
#     LEFT
#    RIGHT

# View(p2)
#     "P2"
#    RIGHT
#    RIGHT
#     LEFT
# ...

CodePudding user response:

Using the xml2 library, you could get the values from a node for parameter using xml_attr().

Something like this:

library(xml2)

x <- read_xml('<userDefinedParameters>
       <USER_DEFINED parameter="P1">LEFT</USER_DEFINED>
       <USER_DEFINED parameter="P2">right</USER_DEFINED>
       <USER_DEFINED parameter="P3">1234</USER_DEFINED>
       <USER_DEFINED parameter="P4">5678</USER_DEFINED>
     </userDefinedParameters>')

dataset <- data.frame(user_defined = x %>% 
                                       xml_find_all("//USER_DEFINED") %>%
                                       xml_text(),
                      parameter = x %>% 
                                    xml_find_all("//USER_DEFINED") %>%
                                    xml_attr("parameter"))

Result in dataset:

  user_defined parameter
1         LEFT        P1
2        right        P2
3         1234        P3
4         5678        P4

CodePudding user response:

If you like to stick with the XML package, you can use the xmlAttrs function inside sapply

text <-' <head> <body> <segment>
 <data>
 <userDefinedParameters>
           <USER_DEFINED parameter="P1">LEFT</USER_DEFINED>
           <USER_DEFINED parameter="P2">right</USER_DEFINED>
           <USER_DEFINED parameter="P3">1234</USER_DEFINED>
           <USER_DEFINED parameter="P4">5678</USER_DEFINED>
         </userDefinedParameters>
       </data>
     </segment>
   </body>
</head>'

library(XML)
doc <- xmlRoot(xmlParse(text))
nodes<-xpathApply(doc, ".//userDefinedParameters/USER_DEFINED")
attributes <- sapply(nodes, function(n) {
   xmlAttrs(unlist(n)) })
values<-xmlValue(nodes)

data.frame(attributes, values)
#   attributes values
# 1         P1   LEFT
# 2         P2  right
# 3         P3   1234
# 4         P4   5678
  • Related