I've got the following data.
df <- data.frame(Name = c("TOMTom Catch",
"BIBill Ronald",
"JEFJeffrey Wilson",
"GEOGeorge Sic",
"DADavid Irris"))
How do I clean the data in names column?
I've tried nchar and substring however some names need the first two characters removed where as other need the first three?
CodePudding user response:
We can use regex lookaround patterns.
gsub("^[A-Z] (?=[A-Z])", "", df$Name, perl = T)
#> [1] "Tom Catch" "Bill Ronald" "Jeffrey Wilson" "George Sic"
#> [5] "David Irris"