Home > OS >  Replace all occurrences of word apart from the first occurrence in a string with empty space in R pr
Replace all occurrences of word apart from the first occurrence in a string with empty space in R pr

Time:07-03

Stu_stuff <- c(" STUDENTS ARE GOOD STUDENTS, BUT THERE ARE ALSO BAD STUDENTS")
  1. str_replace_all(str_replace("STUDENTS","",Stu_stuff))
  2. str_replace(str_replace("STUDENTS","",Stu_stuff))

tried the above but First one replacing only the first occurrence Second one replacing all the occurrences

But would like to keep the first occurrence and replace the rest.

Desired output:

STUDENTS ARE GOOD , BUT THERE ARE ALSO BAD

Any pointers please!

CodePudding user response:

Option to remove twice the 2th occurrence of the word:

Stu_stuff <- c(" STUDENTS ARE GOOD STUDENTS, BUT THERE ARE ALSO BAD STUDENTS")

n <- 2
output <- sub(paste0("((?:STUDENTS.*?){",n-1,"})STUDENTS"), "\\1", Stu_stuff, perl=TRUE)
output <- sub(paste0("((?:STUDENTS.*?){",n-1,"})STUDENTS"), "\\1", output, perl=TRUE)
output
#> [1] " STUDENTS ARE GOOD , BUT THERE ARE ALSO BAD "

Created on 2022-07-02 by the reprex package (v2.0.1)

CodePudding user response:

Try this:

sub("(STUDENTS)(?!.*\\b\\1\\b).*", "", Stu_stuff, perl=TRUE)
[1] " STUDENTS ARE GOOD STUDENTS, BUT THERE ARE ALSO BAD"

Here, we rely on backreference \\1, which recollects the string STUDENT, used inside the negative look-ahead (?!.*\\b\\1\\b), which prevents the recollected string from re-occurring again; thereby we match exactly the second, or last, occurrence of STUDENT, replacing it with nothing in sub's replacement argument.

Alternatively, if you prefer stringr, which has the perl functionality already built into it, over base R, which doesn't, you can use str_replace:

str_replace(Stu_stuff, "(STUDENTS)(?!.*\\b\\1\\b).*", "")
  • Related