Home > Blockchain >  Extract valid numbers from character vector in R
Extract valid numbers from character vector in R

Time:06-10

Suppose I have the below character vector

c("hi", "4", "-21", "6.5", "7. 5", "-2.2", "4h")

Now I want to extract only valid numbers which are in the above vector:

c("4", "-21", "6.5", "-2.2")

note: one space in between . and 5 in 7. 5 so not a valid number.

I was trying with regex /^-?(0|[1-9]\\d*)(\\.\\d )?$/ which is given here but no luck.

So what would be the regex to extract valid numbers from a character vector?

CodePudding user response:

as.numeric already does a great job of this. Anything that's a valid number can be successfully coerced to numeric, everything else is NA.

x = c("hi", "4", "-21", "6.5", "7. 5", "-2.2", "4h")
y = as.numeric(x)
y = y[!is.na(y)]
y
# [1]   4.0 -21.0   6.5  -2.2

CodePudding user response:

We can use grep that matches digits with . from the start (^) till the end ($) of the string

grep("^-?[0-9.] $", v1, value = TRUE)
[1] "4"    "-21"  "6.5"  "-2.2"

Or for fringe cases

grep("^[ -]?[0-9] (\\.\\d )?$", c(v1, "4.1.1"), value = TRUE)
[1] "4"    "-21"  "6.5"  "-2.2"

grep("^[ -]?[0-9] (\\.\\d )?$", c(v1, "4.1.1", " 2.9"), value = TRUE)
[1] "4"    "-21"  "6.5"  "-2.2" " 2.9"
  • Related