Get substring before the second capital letter-CodePudding

Is there an R function to get only the part of a string before the 2nd capital character appears?

For example:

Example <- "MonkeysDogsCats"

Expected output should be:

"Monkeys"

CodePudding user response：

Maybe something like

stringr::str_extract("MonkeysDogsCats", "[A-Z][a-z]*")
#[1] "Monkeys"

CodePudding user response：

Here is an alternative approach:

Here we first put a space before all uppercase and then extract the first word:

library(stringr)

word(gsub("([a-z])([A-Z])","\\1 \\2", Example), 1)

[1] "Monkeys"

CodePudding user response：

A base solution with sub():

x <- "MonkeysDogsCats"

sub("(?<=[a-z])[A-Z].*", "", x, perl = TRUE)
# [1] "Monkeys"

Another way using stringr::word():

stringr::word(x, 1, sep = "(?=[A-Z])\\B")
# [1] "Monkeys"

CodePudding user response：

If the goal is strictly to capture any string before the 2nd capital character, one might want pick a solution it'll also work with all types of strings including numbers and special characters.

strings <- c("MonkeysDogsCats",
             "M4DogsCats",
             "M?DogsCats")

stringr::str_remove(strings, "(?<=.)[A-Z].*")

Output:

[1] "Monkeys"  "M4"  "M?"