I am looking for an efficient solution on extracting all times series behind an xml query. My code is:
library(xml2)
# URL of the data provider
url.iscb <- "http://www.sedlabanki.is/xmltimeseries/"
# The data frame to store all the time series
iscb.rates <- data.frame()
# Dates defining the time range
d.all <- as.Date("1990-01-01")
d.now <- Sys.Date()
# XML
u <- paste0(url.iscb,"Default.aspx?DagsFra=",d.all,"T00:00:00&DagsTil=",
d.now,"T23:59:59&GroupID=1&Type=xml")
# Obtaining the data from the web site...
f <- xml2::read_xml(u)
doc <- xml2::as_list(f)
So far, I cannot extract all the time series that are in f
. The variable doc
seems to store just one time series.
CodePudding user response:
Try this:
library(xml2)
library(magrittr)
# URL of the data provider
url.iscb <- "http://www.sedlabanki.is/xmltimeseries/"
# Dates defining the time range
d.all <- as.Date("1990-01-01")
d.now <- Sys.Date()
# XML
u <- paste0(url.iscb,"Default.aspx?DagsFra=",d.all,"T00:00:00&DagsTil=",
d.now,"T23:59:59&GroupID=1&Type=xml")
# Obtaining the data from the web site...
f <- xml2::read_xml(u)
#Find the timeseries
timeseries <- xml_find_all(f, ".//TimeSeries")
timeseriesID <- timeseries %>% xml_attr("ID")
#timeseries %>% xml_find_all(".//Name") %>% xml_text()
#now step through each timeseries and extract the data
dfs <- lapply(1:length(timeseries), function(index){
currentNode <- timeseries[index]
#Find all of the Entry Nodes
entries <- xml_find_all(currentNode, ".//Entry")
#Extract the Date and Value from each node
dates <- xml_find_first(entries, ".//Date") %>% xml_text()
values <- xml_find_first(entries, ".//Value") %>% xml_double()
# The data frame to store all the time series
iscb.rates <- data.frame(timeseriesID[index], dates, values)
})
#dfs is a list of dataframes
#combine into 1 dataframe
dplyr::bind_rows(dfs)