XML: Extracting all time series from the xml query-CodePudding

I am looking for an efficient solution on extracting all times series behind an xml query. My code is:

library(xml2)

# URL of the data provider
url.iscb <- "http://www.sedlabanki.is/xmltimeseries/"

# The data frame to store all the time series
iscb.rates <- data.frame()

# Dates defining the time range
d.all <- as.Date("1990-01-01")
d.now <- Sys.Date()

# XML
u <- paste0(url.iscb,"Default.aspx?DagsFra=",d.all,"T00:00:00&DagsTil=",
        d.now,"T23:59:59&GroupID=1&Type=xml")

# Obtaining the data from the web site...
f <- xml2::read_xml(u)
doc <- xml2::as_list(f)

So far, I cannot extract all the time series that are in f. The variable doc seems to store just one time series.

CodePudding user response：

Try this:

library(xml2)
library(magrittr)

# URL of the data provider
url.iscb <- "http://www.sedlabanki.is/xmltimeseries/"

# Dates defining the time range
d.all <- as.Date("1990-01-01")
d.now <- Sys.Date()

# XML
u <- paste0(url.iscb,"Default.aspx?DagsFra=",d.all,"T00:00:00&DagsTil=",
            d.now,"T23:59:59&GroupID=1&Type=xml")

# Obtaining the data from the web site...
f <- xml2::read_xml(u)

#Find the timeseries
timeseries <-  xml_find_all(f, ".//TimeSeries")
timeseriesID <- timeseries %>% xml_attr("ID")
#timeseries %>% xml_find_all(".//Name") %>% xml_text()

#now step through each timeseries and extract the data
dfs <- lapply(1:length(timeseries), function(index){
   
   currentNode <- timeseries[index]
   #Find all of the Entry Nodes
   entries <-  xml_find_all(currentNode, ".//Entry")
   
   #Extract the Date and Value from each node
   dates <- xml_find_first(entries, ".//Date") %>% xml_text()
   values <- xml_find_first(entries, ".//Value") %>% xml_double()
   
   # The data frame to store all the time series
   iscb.rates <- data.frame(timeseriesID[index], dates, values)
})

#dfs is a list of dataframes
#combine into 1 dataframe
dplyr::bind_rows(dfs)