Place parentheses around characters separated by comma using regex in r-CodePudding

I'd like to add parentheses around grouped text separated by a comma using stringr. So if there is text that is separated by one or more commas, then I'd like parentheses around the text. There will always be a "=" before this type of string begins and there will either be a space or nothing (vector ends) after the string. Is there a generalized way to do this? Here's a sample problem:

Sample:

a <- data.frame(Rule = c("A=0 & B=Grp1,Grp2", "A=0 & B=Grp1,Grp3,Grp4 & C=1"))
a
                          Rule
1            A=0 & B=Grp1,Grp2
2 A=0 & B=Grp1,Grp3,Grp4 & C=1

Desired Output:

                            Rule
1            A=0 & B=(Grp1,Grp2)
2 A=0 & B=(Grp1,Grp3,Grp4) & C=1

CodePudding user response：

Here is another potential solution. I have altered the example input to show that it works with multiple "Grp's" per line:

library(stringr) 
a <- data.frame(Rule = c("A=0 & B=Grp1,Grp2",
                         "A=0 & B=Grp1,Grp3,Grp4 & C=1 & D=Grp5,Grp6"))

str_replace_all(a$Rule, "=([^, &] ,[^ $] )", "=(\\1)")
#> [1] "A=0 & B=(Grp1,Grp2)"                           
#> [2] "A=0 & B=(Grp1,Grp3,Grp4) & C=1 & D=(Grp5,Grp6)"

^{Created on 2022-11-23 by the reprex package (v2.0.1)}

Explanation:

regex = "=([^, &] ,[^ $] )", "=(\\1)"

=( starting with an equals sign, capture a group

[^, &] , with one or more characters that aren't ",", " ", and "&" followed by a comma

[^ $] ) followed by one or more characters that aren't " " or the end of the line ("$")

=(\\1) then replace the equals sign and add parentheses around the captured group (e.g. the Grp1,Grp2)

CodePudding user response：

This should work:

Find: (([A-Za-z\d] ,) [A-Za-z\d] )

Replace: ($1)

Explanation:

[A-Za-z\d] is any alphanumeric character.

The inner group looks for 1 or more copies of groups of alphanum characters separated by commas. (e.g. Abcd1,Abcd2,)

The outer group then looks for the closing alphanumeric group, which doesn't have a comma after it. (e.g. Abcd3)

These are concatenated then the whole group is captured.

Last thing to do is the replacement, which is pretty self explanatory.