Home > Software engineering >  R proper way to parse xml
R proper way to parse xml

Time:08-31

I have an xml response containing Body and Header nodes, how can I access the value of the $Envelope$Body$checkVatResponse$valid node?

For some reason I already can't find the Body using xml_find_all

library(httr)
library(dplyr)
library(rvest)
library(xml2)

body = r'[<?xml version="1.0" encoding="UTF-8" standalone="no"?>
             <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" >
             <soapenv:Header/>
             <soapenv:Body>
             <urn:checkVat  xmlns:urn="urn:ec.europa.eu:taxud:vies:services:checkVat:types">
             <urn:countryCode>NL</urn:countryCode>
             <urn:vatNumber>800938495B01</urn:vatNumber>
             </urn:checkVat>
             </soapenv:Body>
             </soapenv:Envelope>]'

r <- POST("http://ec.europa.eu/taxation_customs/vies/services/checkVatTestService", body = body)
stop_for_status(r)
content(r) %>% xml_find_all('//Body')
content(r) %>% xml2::as_list()
res <- content(r) 

xml_children(res) %>% xml_name()
# [1] "Header" "Body"  
xml_find_all(res,'.//Body')
# {xml_nodeset (0)}

CodePudding user response:

When working with XML data, you need to be mindful of the namespaces used in the file. You need to previx namespaced nodes with the correct namespace. To extract the valid value you can use

content(r) %>% xml_find_all('//env:Body/ns2:checkVatResponse/ns2:valid')

To see all the namespaces used by the file you can run

content(r) %>% xml_ns()
# env <-> http://schemas.xmlsoap.org/soap/envelope/
# ns2 <-> urn:ec.europa.eu:taxud:vies:services:checkVat:types
  • Related