Hello I'm looking for a R code to delete every words in all paragraphs after a specific term like. Example looking for "Talk:" and replace everything until a new pargraph. I tried regex and spend time but can't succeed ("fjeaofiz" always present).
x <- c("12 3456 789", "Talk: zpfozefpozjgzigzehgoi oezjgzogzjgoezjgo \r fjeaofiz ", "", NA, "Talk: 667")
stri_sub_all(x, stri_locate_all_regex(x, "^Talk:.*\r", omit_no_match=TRUE)) <- "***"
print(x)
My output should be :
x <-"12 3456 789", "***", "", NA, "***"
Any help ?
CodePudding user response:
If the aim is to remove anything that occurs after the string Talk
including Talk
, then this should work:
sub("^Talk.*", "***", x)
[1] "12 3456 789" "***" "" NA "***"
CodePudding user response:
You need to use
stri_sub_all(x, stri_locate_all_regex(x, "(?s)^Talk:.*", omit_no_match=TRUE)) <- "***"
The point here is to remove \r
(your regex matched only the part of the line until CR char) and use (?s)
with .*
pattern to match the rest of the whole string, because stringi
regex package uses ICU regex flavor and .
does not match line break chars (like CR and LF) by default. (?s)
enables .
to match line breaks.
Probably a simpler approach is to use
sub("^Talk:.*", "***", x)
Here, the default TRE regex library is used and .
matches line breaks by default in this regex flavor.