I have a dataframe with two (relevant) factors, and I'd like to remove a substring equal to one factor from the value of the other factor, or leave it alone if there is no such substring. Can I do this using dplyr
?
To make a MWE, suppose these factors are x
and y
.
library(dplyr)
df <- data.frame(x = c(rep('abc', 3)), y = c('a', 'b', 'd'))
df
:
x y
1 abc a
2 abc b
3 abc d
What I want:
x y
1 bc a
2 ac b
3 abc d
My attempt was:
df |> transform(x = gsub(y, '', x))
However, this produces the following, incorrect result, plus a warning message:
x y
1 bc a
2 bc b
3 bc d
Warning message:
In gsub(y, "", x) :
argument 'pattern' has length > 1 and only the first element will be used
How can I do this?
CodePudding user response:
str_remove
is vectorized for the pattern
instead of gsub
library(stringr)
library(dplyr)
df <- df %>%
mutate(x = str_remove(x, y))
-output
df
x y
1 bc a
2 ac b
3 abc d
If we want to use sub/gsub
, then may need rowwise
df %>%
rowwise %>%
mutate(x = sub(y, "", x)) %>%
ungroup