Home > Back-end >  How do I get the first character of the last word in a string in R?
How do I get the first character of the last word in a string in R?

Time:12-08

So I have a list of names, and I want to extract the first character of the last word in the name. I can get the last word, but not the first character of the last word.

species <- c("ACHILLEA MILLEFOLIUM VAR. BOREALIS", 
             "ACHILLEA MILLEFOLIUM VAR. MILLEFOLIUM", 
             "ALLIUM SCHOENOPRASUM VAR. SIBIRICUM")

#can get the last word
str_extract(data$species, "\\w $")
[1] "BOREALIS"    "MILLEFOLIUM" "SIBIRICUM"

What I want is [1] "B" "M" "S"

CodePudding user response:

We may capture the non-whitespace character (\\S) followed by one or more non-whitespace charactrers (\\S ) till the end ($) of the string and replace by the backreference (\\1) of the captured group

sub(".*\\s (\\S)\\S $", "\\1", species)
[1] "B" "M" "S"

CodePudding user response:

This might not be the most elegant solution, but you can always pipe string_extract() a second time to get the first character of the last word.


library(stringr)
species <- c("ACHILLEA MILLEFOLIUM VAR. BOREALIS", 
             "ACHILLEA MILLEFOLIUM VAR. MILLEFOLIUM", 
             "ALLIUM SCHOENOPRASUM VAR. SIBIRICUM")

str_extract(species, "(\\w $)") |> 
  str_extract("^[A-Z]")

[1] "B" "M" "S"

CodePudding user response:

With str_extract you could also assert a whitespace boundary to the left and match the first following word characters, while asserting optional word characters to the end of the string.

If you want to match any non whitespace character you can also use \\S instead of \\w

library (stringr)

species <- c("ACHILLEA MILLEFOLIUM VAR. BOREALIS", 
             "ACHILLEA MILLEFOLIUM VAR. MILLEFOLIUM", 
             "ALLIUM SCHOENOPRASUM VAR. SIBIRICUM")

str_extract(species, "(?<!\\S)\\w(?=\\w*$)")

Output

[1] "B" "M" "S"

See an R demo.

  • Related