Home > Software design >  How can I split up a string based on upper case and lower case in R?
How can I split up a string based on upper case and lower case in R?

Time:06-29

I have a column with names where the surnames are all upper case and the first names are all in lower case except the first letter. How can I split this up? Example: BIDEN Joe

names <- c("BIDEN Joe", "DE WEERDT Jan", "SCHEPERS Caro")

The result I want to achieve is to create to vectors/columns with in one the words with the capital letters so it becomes:

surnames <- c("BIDEN", "DE WEERDT", "SCHEPERS")

And in the other the first names:

first_names <- c("Joe", "Jan", "Caro")

Thank in advance

CodePudding user response:

Try this:

names <- c("BIDEN Joe", "DE WEERDT Jan", "SCHEPERS Caro")

# Remove capitals followed by a space 
first_names  <- gsub("^[A-Z].  ", "", names) 
#  "Joe"  "Jan"  "Caro"

# Replace a space followed by a capital followed by a lower case letter
last_names  <- gsub(" [A-Z][a-z]. $", "", names) 
# "BIDEN"     "DE WEERDT" "SCHEPERS"

Also I wouldn't call the vector names as that is the name of a base function.

CodePudding user response:

You can use capture groups to split the string. For example

names <- c("BIDEN Joe", "DE WEERDT Jan", "SCHEPERS Caro")
m <- regexec("([A-Z ] ) ([A-Z].*)", names, perl=T)
parts <- regmatches(names, m)
parts
# [[1]]
# [1] "BIDEN Joe" "BIDEN"     "Joe"      
# [[2]]
# [1] "DE WEERDT Jan" "DE WEERDT"     "Jan"          
#[[3]]
# [1] "SCHEPERS Caro" "SCHEPERS"      "Caro"

# Last Names
sapply(parts, `[`, 2)
# [1] "BIDEN"     "DE WEERDT" "SCHEPERS" 
# First Names
sapply(parts, `[`, 3)
# [1] "Joe"  "Jan"  "Caro"
  • Related