Home > Net >  Turning a string with brackets and parentheses into a nested list
Turning a string with brackets and parentheses into a nested list

Time:12-10

Consider the following string:

x = "a (b, c), d [e, f (g, h)], i (j, k (l (m, n [o, p])))"

My goal is to take this string and turn it into the following list:

list("a" = list("b", "c"),
     "d" = list("e", "f" = list("g", "h")),
     "i" = list("j", "k" = list("l" = list("m", "n" = list("o", "p")))))

$a
$a[[1]]
[1] "b"

$a[[2]]
[1] "c"


$d
$d[[1]]
[1] "e"

$d$f
$d$f[[1]]
[1] "g"

$d$f[[2]]
[1] "h"



$i
$i[[1]]
[1] "j"

$i$k
$i$k$l
$i$k$l[[1]]
[1] "m"

$i$k$l$n
$i$k$l$n[[1]]
[1] "o"

$i$k$l$n[[2]]
[1] "p"

The first issue is to deal with trying to separate each component:

x = "a (b, c), d [e, f (g, h)], i (j, k (l (m, n [o, p])))"

str_split_quotes = function(s) {
  o = el(strsplit(s, split = "(?<=\\)|\\]),", perl = T))
  lapply(o, function(z) gsub(pattern = " ", "", z))
}

str_unparse_level = function(s) {
  
  check_parsed = function(s) {
    grepl("\\)|\\]", s)
  }
  
  parse = function(s)  {
    if (check_parsed(s)) {
      substring_name    = substr(s, 1, 1)
      substring_content = substr(s, 3, nchar(s) - 1)
      substring_content_split = el(strsplit(substring_content, ",(?![^()]* \\))", perl = T))
      o = list(substring_content_split)
      names(o) = substring_name
      return(o)}
    else {return(s)}
  }
  
  lapply(s, parse)
}

str_unparse_level(str_split_quotes(x))
[[1]]
[[1]]$a
[1] "b" "c"


[[2]]
[[2]]$d
[1] "e"      "f(g,h)"


[[3]]
[[3]]$i
[1] "j"              "k(l(m,n[o,p]))"

Intuitively, what needs to happen here is that some sort of function involving recursion (due to the variable depth of nesting of parentheses/brackets) will need to be executed here so that a list can be created, as I seek above. It is not clear to me how to go about this, given that I rarely use recursion.

CodePudding user response:

Here's a parser that uses a bit of recursion to parse the string


parse_list <- function(x) {
  opens <- c("(", "[")
  closes  <- c(")", "]")
  tokens <- Filter(\(x) x!= " ", strsplit(x, "")[[1]])
  maybenames <- function(values, names) if (any(names!="")) setNames(values, names) else values
  inner <- function(start=1) {
    i <- start
    names <- character()
    values <- list()
    while (i <= length(tokens)) {
      if (i < length(tokens)-1 && tokens[i 1] %in% opens) {
        name <- tokens[i]
        i <- i   1
        opener <- tokens[i]
        i <- i   1
        result <- inner(i)
        names <- c(names, name)
        values <- c(values, list(result$value))
        i <- result$resume
        closer <- tokens[i]
      } else if (tokens[i] == ",") {
        # next item
      } else if (tokens[i] %in% closes) {
        return(list(value=maybenames(values, names), resume = i))
      } else {
        names <- c(names, "")
        values <- c(values, list(tokens[i]))
      }
      i <- i   1
    }
    return(maybenames(values, names))
  }
  inner()
}

And you can call it like

x = "a (b, c), d [e, f (g, h)], i (j, k (l (m, n [o, p])))"
parse_list(x)

This does make the strong assumption that all your labels are single characters, if that's not the case, you'll need to adjust the tokens variable to do whatever splitting is required. There's not really any validation or error checking in place. You may want to add that as well.

CodePudding user response:

An option is to use eval(parse after modifying the ( and [ with list

out <- eval(parse(text = gsub('(\\w )', '"\\1"', 
   paste0("list(", gsub("\\(", "=list(", chartr("[]", "()", x)), 
      ")"))))

-checking with OP's output

> op_out <- list("a" = list("b", "c"),
                 "d" = list("e", "f" = list("g", "h")),
                 "i" = list("j", "k" = list("l" = list("m", "n" = list("o", "p")))))
> all.equal(out, op_out)
[1] TRUE
  •  Tags:  
  • r
  • Related