I am trying to change the text in a column by removing a word in all rows.
For example, if a row contained "second row", I want to replace it with "second". Here is an example dataset
df <- data.frame(row_num = c(1,2,3,4,5),
text_long = c("first row", "second row", "third row", "fourth row", "fifth row"))
> df
# row_num text_long
#1 1 first row
#2 2 second row
#3 3 third row
#4 4 fourth row
#5 5 fifth row
How do I create a new column called "text_short" where the word row is removed from each of the text_long values?
I tried using the following function with mutate()
in the dplyr
package
library(dplyr)
shorten_text <- function(x) {
return(unlist(strsplit(x, split=' '))[1])
}
df <- mutate(df, text_short = shorten_text(df$text_long))
But every row of text_short contains the same value:
>df
# row_num text_long text_short
#1 1 first row first
#2 2 second row first
#3 3 third row first
#4 4 fourth row first
#5 5 fifth row first
CodePudding user response:
df %>%
mutate(text_short = stringr::str_remove(text_long, " row"))
or base R:
df$text_short = gsub(" row", "", df$text_long)
CodePudding user response:
Here's another approch valid only if the first word is the relevant to be extracted
> df %>%
mutate(text_short = sub("(\\w ).*", "\\1", text_long))
row_num text_long text_short
1 1 first row first
2 2 second row second
3 3 third row third
4 4 fourth row fourth
5 5 fifth row fifth
Or directly using R base:
> df$text_short <- sub("(\\w ).*", "\\1", df$text_long)
> df
row_num text_long text_short
1 1 first row first
2 2 second row second
3 3 third row third
4 4 fourth row fourth
5 5 fifth row fifth