Home > OS >  R Is there a way to remove special character from beginning of a string
R Is there a way to remove special character from beginning of a string

Time:10-11

A field from my dataset has some of it's observations to start with a "." e.g ".TN34AB1336"instead of "TN34AB1336". I've tried

truck_log <- truck_log %>%
  filter(BookingID != "WDSBKTP49392") %>%
  filter(BookingID != "WDSBKTP44502") %>%
  mutate(GpsProvider = str_replace(GpsProvider, "NULL", "UnknownGPS")) %>%
  mutate(vehicle_no = str_replace(vehicle_no, ".TN34AB1336", "TN34AB1336"))

the last mutate command in my code worked but there are more of such issues in another field e.g "###TN34AB1336" instead of "TN34AB1336".

So I need a way of automating the process such that all observations that doesn't start with a character is trimmed from the left by a single command in R.

I've attached a picture of a filtered field from spreadsheet to make the question clearer.filtered column showing the issue

CodePudding user response:

We can use regular expressions to replace anything up to the first alphanumeric character with ""to remove everything that is not a Number/Character from the beginning of a string:

names <- c("###TN34AB1336",
           ".TN34AB1336",
           ",TN654835A34",
           ": ?%TN735345")
stringr::str_replace(names,"^. ?(?=[[:alnum:]])","") # Matches up to, but not including the first alphanumeric character and replaces with ""
[1] "TN34AB1336"  "TN34AB1336"  "TN654835A34" "TN735345"   
``

CodePudding user response:

You can use sub.

s <- c(".TN34AB1336", "###TN34AB1336")
sub("^[^A-Z]*", "", s)
#[1] "TN34AB1336" "TN34AB1336"

Where ^ is the start of the string, [^A-Z] matches everything what is not A, B, C, ... , Z and * matches it 0 to n times.

  •  Tags:  
  • r
  • Related