Any Idea whats the best way of getting this xml file into a data frame format in R. It can be any format tbl, data.table...
The xml file is under this link
I tried with the following code but it doesnt work:
result <- xmlParse(file = "kursliste_2022.xml")
xml_result <- xmlToList(result)
xml_structure(xml_result)
CodePudding user response:
Try this:
library(tibble)
library(XML)
library(xml2)
ur_xml <- "your_xml_path"
data <- read_xml(ur_xml)
doc <- xmlParse(data)
df <- xmlToDataFrame(nodes = getNodeSet(doc, "//your_node_of_interest"))
tb <- as_tibble(df)
CodePudding user response:
A data frame and an XML document do not have a unique 1 to 1 mapping. A data frame is a matrix like structure, hence with only two dimension, whereas an XML doc is more similar to an n-dimensional array.
As it turns out, there are multiple ways of doing what you ask, therefore your question is opinion based, which is normally not allowed here.
However, to give you an idea of how you would reduce the dimensions, you can repeat the same values on a column of your data frame For example, with respect to yours first three nodes:
library('xml2')
result <- read_xml("kursliste_2022.xml")
y <- lapply(1:3, \(i) xml_child(result, i))
Reduce(rbind, lapply(y, \(x) cbind(data.frame(t(xml_attrs(x))), t(sapply(xml_children(x), xml_attrs)))))
# id canton lang name
# 1 2587 AG de Aargau
# 2 2587 AG en Aargau
# 3 2587 AG fr Argovie
# 4 2587 AG it Argovia
# 5 2588 AI de Appenzell I. Rh.
# 6 2588 AI en Appenzell Inner-Rhodes
# 7 2588 AI fr Appenzell Rh.-Int.
# 8 2588 AI it Appenzello Interno
# 9 2589 AR de Appenzell A. Rh.
# 10 2589 AR en Appenzell Outer-Rhodes
# 11 2589 AR fr Appenzell Rh.-Ext.
# 12 2589 AR it Appenzello Esterno
Now, please, edit your question in a way that makes it clear what you are asking.