I have the following XML which I am trying to convert into a data frame in R or a table within SQL:
<Root>
<Product>
<Test_ID value="1" />
<Effective_Date value="2022-01-01" />
<Membership value="Yes" />
<Request>
<Request_ID value="1" />
<Request_type value="Simple" />
</Request>
<Request>
<Request_ID value="2" />
<Request_type value="Complex" />
</Request>
</Product>
</Root>
I've tried using XMLtoDataFrame ("xml" library) but this just creates a table without any values. Would anyone be able to help with this please? Open to a SQL solution.
Edit: I'm looking for my table to be in the following format table
CodePudding user response:
Here's an effort.
Assumptions:
Root
is always there, and always just oneProduct
can contain zero or more, so we'll iterate smartly and combine- I do not assume that
$Product$Requests
are always perfectly aligned; it can be simplified (hard-coded fewer lines of code) if this is a guaranteed format.
Some helper functions:
func1 <- function(z) if (is.null(names(z))) attr(z, "value") else lapply(z, func1)
merge.list <- function(A, B) {
# normalize lengths, just in case, since I think you have more than one $Product
A <- lapply(A, `length<-`, max(lengths(A)))
B <- lapply(B, `length<-`, max(lengths(B)))
BnotA <- setdiff(names(B), names(A))
AnotB <- setdiff(names(A), names(B))
inboth <- intersect(names(A), names(B))
A[BnotA] <- replicate(length(BnotA), rep(NA, max(lengths(A))), simplify = FALSE)
A[AnotB] <- lapply(A[AnotB], function(z) c(z, rep(NA, max(lengths(B)))))
A[inboth] <- Map(c, A[inboth], B[inboth])
A
}
Code:
# library(xml2) # read_xml, as_list
L <- xml2::as_list(xml2::read_xml('
<Root>
<Product>
<Test_ID value="1" />
<Effective_Date value="2022-01-01" />
<Membership value="Yes" />
<Request>
<Request_ID value="1" />
<Request_type value="Simple" />
</Request>
<Request>
<Request_ID value="2" />
<Request_type value="Complex" />
</Request>
</Product>
</Root>'))
out <- do.call(rbind.data.frame, lapply(func1(L$Root), function(pr) {
as.data.frame(lapply(split(pr, names(pr)), function(Y) Reduce(merge.list, Y)))
}))
out
# Effective_Date Membership Request.Request_ID Request.Request_type Test_ID
# Product.1 2022-01-01 Yes 1 Simple 1
# Product.2 2022-01-01 Yes 2 Complex 1
(It's a "clean" data.frame
:)
str(out)
# 'data.frame': 2 obs. of 5 variables:
# $ Effective_Date : chr "2022-01-01" "2022-01-01"
# $ Membership : chr "Yes" "Yes"
# $ Request.Request_ID : chr "1" "2"
# $ Request.Request_type: chr "Simple" "Complex"
# $ Test_ID : chr "1" "1"