Extract text after first upper case or space-CodePudding

How can I extract all text after first space in a column where data is something like this

structure(list(value = c("1.1.a Blue sea", "1.2.a Red ball")), row.names = c(NA, -2L), class =c("tbl_df", "tbl", "data.frame"))

so I get a new column with just

Blue sea
Red ball

CodePudding user response：

You can use the following code to select all text after the first white space:

sub("^\\S \\s ", '', df$value)

Output:

[1] "Blue sea" "Red ball"

You can just use this to create it as a new column:

library(dplyr)
df %>%
  mutate(new_value = sub("^\\S \\s ", '', value))

Output:

# A tibble: 2 × 2
  value          new_value
  <chr>          <chr>    
1 1.1.a Blue sea Blue sea 
2 1.2.a Red ball Red ball

CodePudding user response：

You can use str_extract from the package stringr to extract anything that starts with an upper case letter ([[:upper:]]) followed by one or more characters (. ) until the end of a string ($).

library(stringr)

str_extract(df$value, "[[:upper:]]. $")

If you don't want to use regex, you can use str_split to split strings into two parts by an empty space.

str_split(df$value, " ", n = 2, simplify = T)[,2]

Output

The above two methods have the same output:

[1] "Blue sea" "Red ball"