Home > Back-end >  How keep first character but remove everything else before space in R?
How keep first character but remove everything else before space in R?

Time:10-11

I know how I can remove everything before a space in a character string, but I'd like to keep the first initial of a name. How can I do this so the result is "F LastName"? Thanks.

full_name <- "FirstName LastName"
sub(".*? ", "", full_name) # Removes everything before space.

CodePudding user response:

Using sub() with a capture group we can try:

full_name <- "FirstName LastName"
output <- sub("([A-Z])\\w* (.*)", "\\1 \\2", full_name)
output

[1] "F LastName"

CodePudding user response:

A non-regex friendly way:

spl <- strsplit(full_name, " ")[[1]]
paste(substr(spl[1], 1, 1), spl[2])
#[1] "F LastName"

CodePudding user response:

Another way using sub.

sub("(.)[^ ]*", "\\1", full_name)
#[1] "F LastName"

(.) takes the first character and stores it in \\1, [^ ]* takes everything but not a space, \\1 inserts what was at (.).

Or with a look behind.

sub("(?<=.)[^ ]*", "", full_name, perl=TRUE)
#[1] "F LastName"

(?<=.) looks behind if there is any character . but does not consume it.

Benchmark

full_name <- rep("FirstName LastName", 1e5)
bench::mark(GKi1 =  sub("(.)[^ ]*", "\\1", full_name),
            GKi2 =  sub("(?<=.)[^ ]*", "", full_name, perl=TRUE),
            "Tim Biegeleisen" = sub("([A-Z])\\w* (.*)", "\\1 \\2", full_name),
            Mael = {spl <- strsplit(full_name, " ") #Changed that it works on vectors
              sapply(spl, \(spl) paste(substr(spl[1], 1, 1), spl[2]))}
            )
# expression           min   median itr/s…¹ mem_al…² gc/se…³ n_itr  n_gc total…⁴
#  <bch:expr>      <bch:tm> <bch:tm>   <dbl> <bch:by>   <dbl> <int> <dbl> <bch:t>
#1 GKi1              22.6ms   22.7ms   44.0   781.3KB    0       22     0   500ms
#2 GKi2              14.6ms   14.8ms   67.1   781.3KB    1.97    34     1   507ms
#3 Tim Biegeleisen   47.7ms   47.8ms   20.9   781.3KB    0       11     0   526ms
#4 Mael               430ms  446.8ms    2.24   4.06MB   32.5      2    29   894ms

Using sub("(?<=.)[^ ]*", "", full_name, perl=TRUE) is currently the fastest.

  •  Tags:  
  • r
  • Related