Consider the following string:
x = "a (b, c), d [e, f (g, h)], i (j, k (l (m, n [o, p])))"
My goal is to take this string and turn it into the following list:
list("a" = list("b", "c"),
"d" = list("e", "f" = list("g", "h")),
"i" = list("j", "k" = list("l" = list("m", "n" = list("o", "p")))))
$a
$a[[1]]
[1] "b"
$a[[2]]
[1] "c"
$d
$d[[1]]
[1] "e"
$d$f
$d$f[[1]]
[1] "g"
$d$f[[2]]
[1] "h"
$i
$i[[1]]
[1] "j"
$i$k
$i$k$l
$i$k$l[[1]]
[1] "m"
$i$k$l$n
$i$k$l$n[[1]]
[1] "o"
$i$k$l$n[[2]]
[1] "p"
The first issue is to deal with trying to separate each component:
x = "a (b, c), d [e, f (g, h)], i (j, k (l (m, n [o, p])))"
str_split_quotes = function(s) {
o = el(strsplit(s, split = "(?<=\\)|\\]),", perl = T))
lapply(o, function(z) gsub(pattern = " ", "", z))
}
str_unparse_level = function(s) {
check_parsed = function(s) {
grepl("\\)|\\]", s)
}
parse = function(s) {
if (check_parsed(s)) {
substring_name = substr(s, 1, 1)
substring_content = substr(s, 3, nchar(s) - 1)
substring_content_split = el(strsplit(substring_content, ",(?![^()]* \\))", perl = T))
o = list(substring_content_split)
names(o) = substring_name
return(o)}
else {return(s)}
}
lapply(s, parse)
}
str_unparse_level(str_split_quotes(x))
[[1]]
[[1]]$a
[1] "b" "c"
[[2]]
[[2]]$d
[1] "e" "f(g,h)"
[[3]]
[[3]]$i
[1] "j" "k(l(m,n[o,p]))"
Intuitively, what needs to happen here is that some sort of function involving recursion (due to the variable depth of nesting of parentheses/brackets) will need to be executed here so that a list can be created, as I seek above. It is not clear to me how to go about this, given that I rarely use recursion.
CodePudding user response:
Here's a parser that uses a bit of recursion to parse the string
parse_list <- function(x) {
opens <- c("(", "[")
closes <- c(")", "]")
tokens <- Filter(\(x) x!= " ", strsplit(x, "")[[1]])
maybenames <- function(values, names) if (any(names!="")) setNames(values, names) else values
inner <- function(start=1) {
i <- start
names <- character()
values <- list()
while (i <= length(tokens)) {
if (i < length(tokens)-1 && tokens[i 1] %in% opens) {
name <- tokens[i]
i <- i 1
opener <- tokens[i]
i <- i 1
result <- inner(i)
names <- c(names, name)
values <- c(values, list(result$value))
i <- result$resume
closer <- tokens[i]
} else if (tokens[i] == ",") {
# next item
} else if (tokens[i] %in% closes) {
return(list(value=maybenames(values, names), resume = i))
} else {
names <- c(names, "")
values <- c(values, list(tokens[i]))
}
i <- i 1
}
return(maybenames(values, names))
}
inner()
}
And you can call it like
x = "a (b, c), d [e, f (g, h)], i (j, k (l (m, n [o, p])))"
parse_list(x)
This does make the strong assumption that all your labels are single characters, if that's not the case, you'll need to adjust the tokens
variable to do whatever splitting is required. There's not really any validation or error checking in place. You may want to add that as well.
CodePudding user response:
An option is to use eval(parse
after modifying the (
and [
with list
out <- eval(parse(text = gsub('(\\w )', '"\\1"',
paste0("list(", gsub("\\(", "=list(", chartr("[]", "()", x)),
")"))))
-checking with OP's output
> op_out <- list("a" = list("b", "c"),
"d" = list("e", "f" = list("g", "h")),
"i" = list("j", "k" = list("l" = list("m", "n" = list("o", "p")))))
> all.equal(out, op_out)
[1] TRUE