Modifying a string using dplyr mutate() is returning the same value for all rows-CodePudding

I am trying to change the text in a column by removing a word in all rows.

For example, if a row contained "second row", I want to replace it with "second". Here is an example dataset

df <- data.frame(row_num = c(1,2,3,4,5),
                 text_long = c("first row", "second row", "third row", "fourth row", "fifth row"))

> df
#  row_num  text_long
#1       1  first row
#2       2 second row
#3       3  third row
#4       4 fourth row
#5       5  fifth row

How do I create a new column called "text_short" where the word row is removed from each of the text_long values?

I tried using the following function with mutate() in the dplyr package

library(dplyr)

shorten_text <- function(x) {
  return(unlist(strsplit(x, split=' '))[1])
}

df <- mutate(df, text_short = shorten_text(df$text_long))

But every row of text_short contains the same value:

>df
# row_num  text_long text_short
#1       1  first row      first
#2       2 second row      first
#3       3  third row      first
#4       4 fourth row      first
#5       5  fifth row      first

CodePudding user response：

df %>%
  mutate(text_short = stringr::str_remove(text_long, " row"))

or base R:

df$text_short = gsub(" row", "", df$text_long)

CodePudding user response：

Here's another approch valid only if the first word is the relevant to be extracted

> df %>% 
    mutate(text_short = sub("(\\w ).*", "\\1", text_long))
  row_num  text_long text_short
1       1  first row      first
2       2 second row     second
3       3  third row      third
4       4 fourth row     fourth
5       5  fifth row      fifth

Or directly using R base:

> df$text_short <- sub("(\\w ).*", "\\1", df$text_long)
> df
  row_num  text_long text_short
1       1  first row      first
2       2 second row     second
3       3  third row      third
4       4 fourth row     fourth
5       5  fifth row      fifth