Suppose I have the below character vector
c("hi", "4", "-21", "6.5", "7. 5", "-2.2", "4h")
Now I want to extract only valid numbers which are in the above vector:
c("4", "-21", "6.5", "-2.2")
note: one space in between . and 5 in 7. 5
so not a valid number.
I was trying with regex /^-?(0|[1-9]\\d*)(\\.\\d )?$/
which is given here but no luck.
So what would be the regex to extract valid numbers from a character vector?
CodePudding user response:
as.numeric
already does a great job of this. Anything that's a valid number can be successfully coerced to numeric, everything else is NA
.
x = c("hi", "4", "-21", "6.5", "7. 5", "-2.2", "4h")
y = as.numeric(x)
y = y[!is.na(y)]
y
# [1] 4.0 -21.0 6.5 -2.2
CodePudding user response:
We can use grep
that matches digits with .
from the start (^
) till the end ($
) of the string
grep("^-?[0-9.] $", v1, value = TRUE)
[1] "4" "-21" "6.5" "-2.2"
Or for fringe cases
grep("^[ -]?[0-9] (\\.\\d )?$", c(v1, "4.1.1"), value = TRUE)
[1] "4" "-21" "6.5" "-2.2"
grep("^[ -]?[0-9] (\\.\\d )?$", c(v1, "4.1.1", " 2.9"), value = TRUE)
[1] "4" "-21" "6.5" "-2.2" " 2.9"