Home > Back-end >  R regex that decompose a list of arguments (from an other language) in a string
R regex that decompose a list of arguments (from an other language) in a string

Time:04-25

I'm looking for a regex that decomposes a string containing arguments of a function written in another language in a list of the form argName = value.

An instance of my string of arguments is:

args <- "arg1, arg2 = {{space}}, arg3 = TRUE, arg4 = {{plot, datG1, arg1 = TRUE}}, arg5 = ga, arg6 = {{bla bla {{plot, datG1, arg1 = TRUE}}}}"

where arg1 is an argument without value (by convention, here, value = NA), arg2 takes the value "{{space}}", arg3 takes "TRUE", etc.

Each value should be returned as a string (or NA). The special form {{foo}} is the convention for either a function (as in {{space}}) or a text eventually containing functions (as in {{bla bla {{plot, datG1, arg1 = TRUE}}}}). I already have a code identifying functions and pure text. The only thing I need is to list arguments of each function.

So here, the regex should allow me to decompose the string args in the list

list(
  arg1 = NA,
  arg2 = "{{space}}", 
  arg3 = "TRUE", 
  arg4 = "{{plot, datG1, arg1 = TRUE}}",
  arg5 = "ga",
  arg6 = "{{bla bla {{plot, datG1, arg1 = TRUE}}}}"
)

The regex I use to identify functions is "\\{\\{((?>[^\\{\\{\\}\\}] |(?R))*)\\}\\}"

CodePudding user response:

You can use

args <- "arg1, arg2 = {{space}}, arg3 = TRUE, arg4 = {{plot, datG1, arg1 = TRUE}}, arg5 = ga, arg6 = {{bla bla {{plot, datG1, arg1 = TRUE}}}}"
rx <- "(\\w )(?:\\s*=\\s*((\\{\\{((?>(?!\\{\\{|}})(?s).|(?3))*)}})|\\w ))?"
matches <- regmatches(args, gregexec(rx, args, perl=TRUE))
keys <- matches[[1]][2,]
values <- matches[[1]][3,]
values[values==""] <- NA
names(values) <- keys

See the regex demo. Now, values will contain your data. You may also put the data into a dataframe with df <- data.frame(params=matches[[1]][2,], values=matches[[1]][3,]).

Details:

  • (\w ) - Group 1: one or more word chars
  • (?:\s*=\s*((\{\{((?>(?!\{\{|}})(?s).|(?2))*)}})|\w ))? - an optional sequence of
    • \s*=\s* - a = char enclosed with zero or more whitespaces
    • ((\{\{((?>(?!\{\{|}})(?s).|(?2))*)}})|\w ) - Group 2:
      • (\{\{((?>(?!\{\{|}})(?s).|(?2))*)}}) - Group 3 (used for recursion): a {{, then any zero or more repetitions of any char that does not start a {{ or }} char sequences (repeated zero or more times), or the Group 3 pattern, and then a }} substring
      • | - or
      • \w - one or more word chars.
  • Related