I'm looking for a regex that decomposes a string containing arguments of a function written in another language in a list of the form argName = value
.
An instance of my string of arguments is:
args <- "arg1, arg2 = {{space}}, arg3 = TRUE, arg4 = {{plot, datG1, arg1 = TRUE}}, arg5 = ga, arg6 = {{bla bla {{plot, datG1, arg1 = TRUE}}}}"
where arg1
is an argument without value (by convention, here, value = NA
), arg2
takes the value "{{space}}"
, arg3
takes "TRUE"
, etc.
Each value should be returned as a string (or NA
). The special form {{foo}}
is the convention for either a function (as in {{space}}
) or a text eventually containing functions (as in {{bla bla {{plot, datG1, arg1 = TRUE}}}}
). I already have a code identifying functions and pure text. The only thing I need is to list arguments of each function.
So here, the regex should allow me to decompose the string args
in the list
list(
arg1 = NA,
arg2 = "{{space}}",
arg3 = "TRUE",
arg4 = "{{plot, datG1, arg1 = TRUE}}",
arg5 = "ga",
arg6 = "{{bla bla {{plot, datG1, arg1 = TRUE}}}}"
)
The regex I use to identify functions is "\\{\\{((?>[^\\{\\{\\}\\}] |(?R))*)\\}\\}"
CodePudding user response:
You can use
args <- "arg1, arg2 = {{space}}, arg3 = TRUE, arg4 = {{plot, datG1, arg1 = TRUE}}, arg5 = ga, arg6 = {{bla bla {{plot, datG1, arg1 = TRUE}}}}"
rx <- "(\\w )(?:\\s*=\\s*((\\{\\{((?>(?!\\{\\{|}})(?s).|(?3))*)}})|\\w ))?"
matches <- regmatches(args, gregexec(rx, args, perl=TRUE))
keys <- matches[[1]][2,]
values <- matches[[1]][3,]
values[values==""] <- NA
names(values) <- keys
See the regex demo. Now, values
will contain your data. You may also put the data into a dataframe with df <- data.frame(params=matches[[1]][2,], values=matches[[1]][3,])
.
Details:
(\w )
- Group 1: one or more word chars(?:\s*=\s*((\{\{((?>(?!\{\{|}})(?s).|(?2))*)}})|\w ))?
- an optional sequence of\s*=\s*
- a=
char enclosed with zero or more whitespaces((\{\{((?>(?!\{\{|}})(?s).|(?2))*)}})|\w )
- Group 2:(\{\{((?>(?!\{\{|}})(?s).|(?2))*)}})
- Group 3 (used for recursion): a{{
, then any zero or more repetitions of any char that does not start a{{
or}}
char sequences (repeated zero or more times), or the Group 3 pattern, and then a}}
substring|
- or\w
- one or more word chars.