I'm hoping to pull full names out of the following pattern. Some names have hyphens or multiple caps as the examples given:
(all numbers inside parentheses are either 1 or 2 digits). All Capitalized city abbreviations before parentheses are either 2 or 3 chars long)
Davante Adams LV (6)
Christian McCaffrey CAR (10)
J.K. Dobbins BAL (5)
Amon-Ra St. Brown DET (7)
AJ Brown PHI (11)
Michael Pittman Jr. IND (14)
JuJu Smith-Schuster PIT (9)
Results should be...
Davante Adams
Christian McCaffrey
J.K. Dobbins
Amon-Ra St. Brown
AJ Brown
Michael Pittman Jr.
JuJu Smith-Schuster
CodePudding user response:
We may use trimws
with regex as whitespace
i.e. one or more space (\\s
) followed by one or more uppercase letters ([A-Z]
), then any space and one or more digits (\\d
) within the brackets
trimws(str1, whitespace = "\\s [A-Z] \\s*\\(\\d \\)")
-output
[1] "Davante Adams"
[2] "Christian McCaffrey"
[3] "J.K. Dobbins"
[4] "Amon-Ra St. Brown"
[5] "AJ Brown"
[6] "Michael Pittman Jr."
[7] "JuJu Smith-Schuster"
data
str1 <- c("Davante Adams LV (6)", "Christian McCaffrey CAR (10)", "J.K. Dobbins BAL (5)",
"Amon-Ra St. Brown DET (7)", "AJ Brown PHI (11)", "Michael Pittman Jr. IND (14)",
"JuJu Smith-Schuster PIT (9)")
CodePudding user response:
Using strsplit
strsplit(str, " [A-Z] \\(\\d \\) *")[[1]]
#> [1] "Davante Adams" "Christian McCaffrey" "J.K. Dobbins"
#> [4] "Amon-Ra St. Brown" "AJ Brown" "Michael Pittman Jr."
#> [7] "JuJu Smith-Schuster"
``