I have a strings that look like this:
problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
In between each group, there's " & ". I want to use R (either sub() or something from the stringr package) to replace every " &" with a "," when there's more than one & present. However, I don't want the final & to be changed. How would I do that so it looks like:
#Note: Only the 3rd and 4th strings should be changed
solution <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1, GROUP 2 & GROUP 3", "GROUP 1, GROUP 2, GROUP 3 & GROUP 4")
In the actual string, there could be an infinite number of &s, so I don't want to hard code a limit if possible.
CodePudding user response:
We could use regular expressions with a lookahead assertion Regex lookahead, lookbehind and atomic groups.
library(stringr)
str_replace_all(problem, " &(?=.*?&)", ", ")
output:
[1] "GROUP 1"
[2] "GROUP 1 & GROUP 2"
[3] "GROUP 1, GROUP 2 & GROUP 3"
[4] "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"
CodePudding user response:
Another solution:
str_replace_all(problem," &", ",") %>%
str_replace(", (GROUP [0-9])$", " & \\1")
CodePudding user response:
It could be done using Perl mode and the \G anchor.
Insure 2 or more &'s, then match any & that has another downstream.
(?m)(?:^(?=.*&.*&)|(?!^)\G)[^&\n]*\K&(?=.*&)
Replace with comma ,
https://regex101.com/r/Mtvopf/1
(?m)
(?:
^
(?= .* & .* & )
| (?! ^ )
\G
)
[^&\n]* \K &
(?= .* & )
CodePudding user response:
Using strsplit
sapply(strsplit(problem, "\\s &\\s "),
function(x) sub(",([^,] $)", " & \\1", toString(x)))
-output
[1] "GROUP 1" "GROUP 1 & GROUP 2" "GROUP 1, GROUP 2 & GROUP 3" "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"
CodePudding user response:
You can use
\K&(?= .* & )
The pattern matches:
\K
Match a space, and clear the match buffer (forget what is matched so far)&
Match literally(?= .* & )
Positive lookahead, assert a space to the right and another occurrence of&
For example
problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
gsub(" \\K&(?= .* & )", ",", problem, perl=T)
Output
[1] "GROUP 1"
[2] "GROUP 1 & GROUP 2"
[3] "GROUP 1 , GROUP 2 & GROUP 3"
[4] "GROUP 1 , GROUP 2 , GROUP 3 & GROUP 4"
CodePudding user response:
Use
problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
library(stringr)
str_replace_all(problem, "\\s*&\\s*(?=[^&]*&)", ", ")
Results:
[1] "GROUP 1" "GROUP 1 & GROUP 2"
[3] "GROUP 1, GROUP 2 & GROUP 3" "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"
See R proof.
EXPLANATION
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^&]* any character except: '&' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
) end of look-ahead