Home > Software engineering >  Convert XML file into DATA.FRAME/TABLE IN R
Convert XML file into DATA.FRAME/TABLE IN R

Time:11-08

Any Idea whats the best way of getting this xml file into a data frame format in R. It can be any format tbl, data.table...

The xml file is under this link

https://www.ictax.admin.ch/extern/api/download/2619327/ea428026a27f772d57efbbcdc56bff62/kursliste_2022.zip

I tried with the following code but it doesnt work:

result <- xmlParse(file = "kursliste_2022.xml")

xml_result <- xmlToList(result)

xml_structure(xml_result)

CodePudding user response:

Try this:

library(tibble)
library(XML)
library(xml2)

ur_xml <- "your_xml_path"
data <- read_xml(ur_xml)
doc <- xmlParse(data)
df <- xmlToDataFrame(nodes = getNodeSet(doc, "//your_node_of_interest"))

tb <- as_tibble(df)

CodePudding user response:

A data frame and an XML document do not have a unique 1 to 1 mapping. A data frame is a matrix like structure, hence with only two dimension, whereas an XML doc is more similar to an n-dimensional array.
As it turns out, there are multiple ways of doing what you ask, therefore your question is opinion based, which is normally not allowed here.

However, to give you an idea of how you would reduce the dimensions, you can repeat the same values on a column of your data frame For example, with respect to yours first three nodes:

library('xml2')
result <- read_xml("kursliste_2022.xml")
y <- lapply(1:3, \(i) xml_child(result, i))
Reduce(rbind, lapply(y, \(x) cbind(data.frame(t(xml_attrs(x))), t(sapply(xml_children(x), xml_attrs)))))
#      id canton lang                   name
# 1  2587     AG   de                 Aargau
# 2  2587     AG   en                 Aargau
# 3  2587     AG   fr                Argovie
# 4  2587     AG   it                Argovia
# 5  2588     AI   de       Appenzell I. Rh.
# 6  2588     AI   en Appenzell Inner-Rhodes
# 7  2588     AI   fr     Appenzell Rh.-Int.
# 8  2588     AI   it     Appenzello Interno
# 9  2589     AR   de       Appenzell A. Rh.
# 10 2589     AR   en Appenzell Outer-Rhodes
# 11 2589     AR   fr     Appenzell Rh.-Ext.
# 12 2589     AR   it     Appenzello Esterno

Now, please, edit your question in a way that makes it clear what you are asking.

  • Related