I have a column with names where the surnames are all upper case and the first names are all in lower case except the first letter. How can I split this up? Example: BIDEN Joe
names <- c("BIDEN Joe", "DE WEERDT Jan", "SCHEPERS Caro")
The result I want to achieve is to create to vectors/columns with in one the words with the capital letters so it becomes:
surnames <- c("BIDEN", "DE WEERDT", "SCHEPERS")
And in the other the first names:
first_names <- c("Joe", "Jan", "Caro")
Thank in advance
CodePudding user response:
Try this:
names <- c("BIDEN Joe", "DE WEERDT Jan", "SCHEPERS Caro")
# Remove capitals followed by a space
first_names <- gsub("^[A-Z]. ", "", names)
# "Joe" "Jan" "Caro"
# Replace a space followed by a capital followed by a lower case letter
last_names <- gsub(" [A-Z][a-z]. $", "", names)
# "BIDEN" "DE WEERDT" "SCHEPERS"
Also I wouldn't call the vector names
as that is the name of a base
function.
CodePudding user response:
You can use capture groups to split the string. For example
names <- c("BIDEN Joe", "DE WEERDT Jan", "SCHEPERS Caro")
m <- regexec("([A-Z ] ) ([A-Z].*)", names, perl=T)
parts <- regmatches(names, m)
parts
# [[1]]
# [1] "BIDEN Joe" "BIDEN" "Joe"
# [[2]]
# [1] "DE WEERDT Jan" "DE WEERDT" "Jan"
#[[3]]
# [1] "SCHEPERS Caro" "SCHEPERS" "Caro"
# Last Names
sapply(parts, `[`, 2)
# [1] "BIDEN" "DE WEERDT" "SCHEPERS"
# First Names
sapply(parts, `[`, 3)
# [1] "Joe" "Jan" "Caro"