Home > front end >  extracting just numbers at the front of a string
extracting just numbers at the front of a string

Time:05-18

I need to parse through a list of addresses and remove the ones without street numbers or with PO boxes. I want to create a column, street number, that is just the numbers at the front of the string, and NA if it starts with letters. So for:

street<-c("123 fake st", "PO box 12", "fake st unit 2", "123 fake st apt 1")

I would want:

c(123, NA, NA, 123)

I see a lot of q&a's for subsetting numbers from a string, but I'm not sure how to do it without pulling in the numbers from the back end too.

CodePudding user response:

We can use str_extract to capture the digits (\\d ) at the start (^) of the string

library(stringr)
as.numeric(str_extract(street, "^\\d "))
[1] 123  NA  NA 123

Or using base R functions with strsplit

as.numeric(sapply(strsplit(street, " "), `[`, 1))
[1] 123  NA  NA 123

or trimws

as.numeric(trimws(street, whitespace = "\\s .*"))
[1] 123  NA  NA 123

CodePudding user response:

In base R we can use sub to replace starting from a non number to the end of the string

as.numeric(sub("\\D .*", "", street))

[1] 123  NA  NA 123

If you do not know regular expressions, you can use parse_number function with ifelse. as shown below

library(tidyverse)
ifelse(substr(street, 1, 1) %in% 0:9, parse_number(street), NA)
[1] 123  NA  NA 123
  •  Tags:  
  • r
  • Related