How can I extract all text after first space in a column where data is something like this
structure(list(value = c("1.1.a Blue sea", "1.2.a Red ball")), row.names = c(NA, -2L), class =c("tbl_df", "tbl", "data.frame"))
so I get a new column with just
Blue sea
Red ball
CodePudding user response:
You can use the following code to select all text after the first white space:
sub("^\\S \\s ", '', df$value)
Output:
[1] "Blue sea" "Red ball"
You can just use this to create it as a new column:
library(dplyr)
df %>%
mutate(new_value = sub("^\\S \\s ", '', value))
Output:
# A tibble: 2 × 2
value new_value
<chr> <chr>
1 1.1.a Blue sea Blue sea
2 1.2.a Red ball Red ball
CodePudding user response:
You can use str_extract
from the package stringr
to extract anything that starts with an upper case letter ([[:upper:]]
) followed by one or more characters (.
) until the end of a string ($
).
library(stringr)
str_extract(df$value, "[[:upper:]]. $")
If you don't want to use regex, you can use str_split
to split strings into two parts by an empty space.
str_split(df$value, " ", n = 2, simplify = T)[,2]
Output
The above two methods have the same output:
[1] "Blue sea" "Red ball"