Home > Software engineering >  Regex for pulling name out of pattern
Regex for pulling name out of pattern

Time:05-27

I'm hoping to pull full names out of the following pattern. Some names have hyphens or multiple caps as the examples given:

(all numbers inside parentheses are either 1 or 2 digits). All Capitalized city abbreviations before parentheses are either 2 or 3 chars long)

Davante Adams LV (6)
Christian McCaffrey CAR (10)
J.K. Dobbins BAL (5)
Amon-Ra St. Brown DET (7)
AJ Brown PHI (11)
Michael Pittman Jr. IND (14)
JuJu Smith-Schuster PIT (9)

Results should be...

Davante Adams
Christian McCaffrey
J.K. Dobbins
Amon-Ra St. Brown
AJ Brown
Michael Pittman Jr.
JuJu Smith-Schuster

CodePudding user response:

We may use trimws with regex as whitespace i.e. one or more space (\\s ) followed by one or more uppercase letters ([A-Z] ), then any space and one or more digits (\\d ) within the brackets

trimws(str1, whitespace = "\\s [A-Z] \\s*\\(\\d \\)")

-output

[1] "Davante Adams"      
[2] "Christian McCaffrey" 
[3] "J.K. Dobbins"     
[4] "Amon-Ra St. Brown" 
[5]  "AJ Brown"           
[6] "Michael Pittman Jr." 
[7] "JuJu Smith-Schuster"

data

str1 <- c("Davante Adams LV (6)", "Christian McCaffrey CAR (10)", "J.K. Dobbins BAL (5)", 
"Amon-Ra St. Brown DET (7)", "AJ Brown PHI (11)", "Michael Pittman Jr. IND (14)", 
"JuJu Smith-Schuster PIT (9)")

CodePudding user response:

Using strsplit

strsplit(str, " [A-Z]  \\(\\d \\) *")[[1]]
#> [1] "Davante Adams"       "Christian McCaffrey" "J.K. Dobbins"       
#> [4] "Amon-Ra St. Brown"   "AJ Brown"            "Michael Pittman Jr."
#> [7] "JuJu Smith-Schuster"
``
  • Related