R Is there a way to remove special character from beginning of a string-CodePudding

A field from my dataset has some of it's observations to start with a "." e.g ".TN34AB1336"instead of "TN34AB1336". I've tried

truck_log <- truck_log %>%
  filter(BookingID != "WDSBKTP49392") %>%
  filter(BookingID != "WDSBKTP44502") %>%
  mutate(GpsProvider = str_replace(GpsProvider, "NULL", "UnknownGPS")) %>%
  mutate(vehicle_no = str_replace(vehicle_no, ".TN34AB1336", "TN34AB1336"))

the last mutate command in my code worked but there are more of such issues in another field e.g "###TN34AB1336" instead of "TN34AB1336".

So I need a way of automating the process such that all observations that doesn't start with a character is trimmed from the left by a single command in R.

I've attached a picture of a filtered field from spreadsheet to make the question clearer.

CodePudding user response：

We can use regular expressions to replace anything up to the first alphanumeric character with ""to remove everything that is not a Number/Character from the beginning of a string:

names <- c("###TN34AB1336",
           ".TN34AB1336",
           ",TN654835A34",
           ": ?%TN735345")
stringr::str_replace(names,"^. ?(?=[[:alnum:]])","") # Matches up to, but not including the first alphanumeric character and replaces with ""
[1] "TN34AB1336"  "TN34AB1336"  "TN654835A34" "TN735345"   
``

CodePudding user response：

You can use sub.

s <- c(".TN34AB1336", "###TN34AB1336")
sub("^[^A-Z]*", "", s)
#[1] "TN34AB1336" "TN34AB1336"

Where ^ is the start of the string, [^A-Z] matches everything what is not A, B, C, ... , Z and * matches it 0 to n times.