Home > OS >  How to use replace the first occurrences of a string only if it appears more than once in R?
How to use replace the first occurrences of a string only if it appears more than once in R?

Time:10-09

I have a strings that look like this:

problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")

In between each group, there's " & ". I want to use R (either sub() or something from the stringr package) to replace every " &" with a "," when there's more than one & present. However, I don't want the final & to be changed. How would I do that so it looks like:

#Note: Only the 3rd and 4th strings should be changed
solution <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1, GROUP 2 & GROUP 3", "GROUP 1, GROUP 2, GROUP 3 & GROUP 4")

In the actual string, there could be an infinite number of &s, so I don't want to hard code a limit if possible.

CodePudding user response:

We could use regular expressions with a lookahead assertion Regex lookahead, lookbehind and atomic groups.

library(stringr)
str_replace_all(problem, " &(?=.*?&)", ", ")

output:

[1] "GROUP 1"                              
[2] "GROUP 1 & GROUP 2"                    
[3] "GROUP 1,  GROUP 2 & GROUP 3"          
[4] "GROUP 1,  GROUP 2,  GROUP 3 & GROUP 4"

CodePudding user response:

Another solution:

str_replace_all(problem," &", ",") %>% 
  str_replace(", (GROUP [0-9])$", " & \\1")

CodePudding user response:

It could be done using Perl mode and the \G anchor.

Insure 2 or more &'s, then match any & that has another downstream.

(?m)(?:^(?=.*&.*&)|(?!^)\G)[^&\n]*\K&(?=.*&)

Replace with comma ,

https://regex101.com/r/Mtvopf/1

 (?m)
 (?:
    ^ 
    (?= .* & .* & )
  | (?! ^ )
    \G 
 )
 [^&\n]* \K &
 (?= .* & )

CodePudding user response:

Using strsplit

 sapply(strsplit(problem, "\\s &\\s "), 
    function(x) sub(",([^,] $)", " & \\1", toString(x)))

-output

[1] "GROUP 1"                              "GROUP 1 &  GROUP 2"                   "GROUP 1, GROUP 2 &  GROUP 3"          "GROUP 1, GROUP 2, GROUP 3 &  GROUP 4"

CodePudding user response:

You can use

\K&(?= .* & )

The pattern matches:

  • \K Match a space, and clear the match buffer (forget what is matched so far)
  • & Match literally
  • (?= .* & ) Positive lookahead, assert a space to the right and another occurrence of &

Regex demo

For example

problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
gsub(" \\K&(?= .* & )", ",", problem, perl=T)

Output

[1] "GROUP 1"                              
[2] "GROUP 1 & GROUP 2"                    
[3] "GROUP 1 , GROUP 2 & GROUP 3"          
[4] "GROUP 1 , GROUP 2 , GROUP 3 & GROUP 4"

CodePudding user response:

Use

problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
library(stringr)
str_replace_all(problem, "\\s*&\\s*(?=[^&]*&)", ", ")

Results:

[1] "GROUP 1"                             "GROUP 1 & GROUP 2"                  
[3] "GROUP 1, GROUP 2 & GROUP 3"          "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"

See R proof.

EXPLANATION

--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  &                        '&'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^&]*                    any character except: '&' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    &                        '&'
--------------------------------------------------------------------------------
  )                        end of look-ahead
  • Related