Home > database >  R- Remove all numeric character from right until non numeric character is reached
R- Remove all numeric character from right until non numeric character is reached

Time:03-19

I am attempting to clean out addresses that come in this format 1804 E Osage Rd DERBY KS 670378863 or 55 Cabela Dr GARNER NC 27529 As shown, the postal codes towards the end of the address are inconsistent and I would like to remove the numeric portion of the address from the right overall. In excel I am able to use the =LEFT(A2, Len(A2)-x)) but it's still not good, since the x is not variable based on the length of the numeric characters in the string.

How can I use R or regex, to remove all numeric characters from the right until a non-numeric character is reached?

Expected output to look like -

raw_Address clean_Address
1804 E Osage Rd DERBY KS 670378863 1804 E Osage Rd DERBY KS
55 Cabela Dr GARNER NC 27529 55 Cabela Dr GARNER NC

CodePudding user response:

We may use trimws from base R - match the one or more whitespace followed by the one or more digits which remove the one at the right

df1$clean_Address <- trimws(df1$raw_Address, whitespace = "\\s \\d ")

-output

> df1
                         raw_Address            clean_Address
1 1804 E Osage Rd DERBY KS 670378863 1804 E Osage Rd DERBY KS
2       55 Cabela Dr GARNER NC 27529   55 Cabela Dr GARNER NC

data

df1 <- structure(list(raw_Address = c("1804 E Osage Rd DERBY KS 670378863", 
"55 Cabela Dr GARNER NC 27529")), row.names = c(NA, -2L), class = "data.frame")

CodePudding user response:

Using {stringr}

raw_Address <-  c("1804 E Osage Rd DERBY KS 670378863", "55 Cabela Dr 
GARNER NC 27529")

library(stringr)

str_replace(raw_Address, "\\s\\d $", "")

#or even more simply

str_remove(raw_Address, "\\s\\d $")

#> [1] "1804 E Osage Rd DERBY KS" "55 Cabela Dr GARNER NC"

Created on 2022-03-18 by the reprex package (v2.0.1)

  • Related