Home > Mobile >  How to remove middle name from a full name in R
How to remove middle name from a full name in R

Time:09-30

I have a field in a data frame that formatted as last name, coma, space, first name, space, middle name, and sometimes without middle name. I need to remove middle names from the full names when they have it, and all spaces. Couldn't figure out how. My guess is that it will involve regular expression and stuff. It would be nice if you can provide explanations for the answer. Below is an example,

names <- c("Casillas, Kyron Jamar", "Knoll, Keyana","McDonnell, Messiah Abdul")
names

Expected output will be,

names_n <- c("Casillas,Kyron", "Knoll,Keyana","McDonnell,Messiah")
names_n

Thanks!

CodePudding user response:

You can use this:

gsub("([^,] ,).*?(\\w )$","\\1\\2",names)
[1] "Casillas,Jamar"  "Knoll,Keyana"    "McDonnell,Abdul"

Here we divide the string into two capturing groups and use backreference to recollect their content:

  • ([^,] ,): the 1st capture group, which captures any sequence of characters that is not a ,followed by a comma
  • .*?: this lazily matches what follows until ...
  • (\\w )$: ... the 2nd capture group, which captures the alphanumeric string at the end

\\1\\2 in the replacment argument recollects the contents of the two capture groups only, thereby removing whatever is not captured. If you wish to separate the surname from the first name not only by a comma but also a whitespace just squeeze one whitespace between the two backreferences, thus: \\1 \\2

CodePudding user response:

We may capture the second word (\\w ) and replace with the backreference (\\1) of the captured word

sub("\\s ", "", sub("\\s (\\w )\\s \\w $", "\\1", names))

-output

[1] "Casillas,Kyron"    "Knoll,Keyana"      "McDonnell,Messiah"
  • Related