I have a vector that is constructed with numbers and letters. I want to get all the characters before the LAST letter of each value (which is I guess, always the 2nd letter of the vector). Using stringr (preferably)...
Example :
x = c("1H23456789H10", "97845784584H2", "0H987654321H0", "0P45454545A3", "63A00000000000A91")
str_extract_all(string = x, pattern = ????????)
I tried some tricks here : https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf
The result I want is :
"1H23456789" instead of "1H23456789H10"
"97845784584" instead of "97845784584H2"
"0H987654321", instead of "0H987654321H0"
"0P45454545", instead of "0P45454545A3"
"63A00000000000" instead of "63A00000000000A91"
CodePudding user response:
str_extract(string = x, pattern = "[^A-Z]*[A-Z][^A-Z]*")
# [1] "1H23456789" "0H987654321" "0P45454545" "63A00000000000"
Explanation: we want to extract 1 pattern match per input, so we use str_extract
not str_extract_all
. Our pattern [^A-Z]*
, any number of non-letters, followed by [A-Z]
exactly one letter, followed by [^A-Z]*
any number of non-letters. I just used capital letters based on your input, but you could change A-Z
to A-Za-z
inside the brackets to include lower case letters.