Home > Net >  Are there alternative (other than XML library) ways to parse XML file in R?
Are there alternative (other than XML library) ways to parse XML file in R?

Time:06-07

There are reasons why I cannot use XML package. I have a code using XML library. However, is it possible somehow to rewrite the code for rows below # XML package and still get the same results? Unfortunately, I cannot add a reproducible example using dput of the XML, as it does not show anything to copy and paste here.

Here, I found a link that shows how alternatively xml2 package can be used, but not for all functions.


#read url
url <- "https://transparency.entsoe.eu/api?securityToken=xxxx&documentType=A82&BusinessType=A96&controlArea_Domain=10YFI-1--------U&periodStart=202206020000&periodEnd=202206040000"

# find subset for timeseries
myXMLts_up <- xml_child(myXMLfile, search = 13, ns = xml_ns(myXMLfile)) 

# find subset for position/quantity data
myXMLpts_up <- xml_child(myXMLts_up, search = 7, ns = xml_ns(myXMLts_up))


# XML package
myXML  <- xmlTreeParse(my_data ,asText = TRUE, useInternal = TRUE)

myXML  <- xmlRoot(myXML )

# convert to dataframe
myXMLdf  <- xmlToDataFrame(myXML )

CodePudding user response:

Here is a solution with package rvest. The package title is

Wrappers around the 'xml2' and 'httr' packages to make it easy to download, then manipulate, HTML and XML.

The code is self-explanatory.

suppressPackageStartupMessages({
  library(rvest)
  library(dplyr)
})

url <- "https://transparency.entsoe.eu/api?documentType=A82&BusinessType=A96&controlArea_Domain=10YFI-1--------U&periodStart=202206020000&periodEnd=202206040000"

myXMLfile <- read_html(url)

position <- myXMLfile %>%
  html_elements("position") %>%
  html_text() %>%
  as.integer

quantity <- myXMLfile %>%
  html_elements("quantity") %>%
  html_text() %>%
  as.integer

myXMLdf <- data.frame(position, quantity)
head(myXMLdf)
#>   position quantity
#> 1        1        0
#> 2        2        0
#> 3        3        0
#> 4        4       80
#> 5        5       80
#> 6        6       80

Created on 2022-06-06 by the reprex package (v2.0.1)

  •  Tags:  
  • r xml
  • Related