I would like to know the position of where the first number in a decimal place is less than 5. If this is not possible (where all numbers are 5 or over) then however many decimal places the number has should be returned instead.
So this data:
library(dplyr)
Data <- tibble(Number = c(0.998971282, 0.97871, 0.98121752874, 0.98921752874, 0.95171358,0.99999999))
Should produce an output like this:
Data %>% mutate(Position = c(6, 5, 3, 4, 3, 8))
CodePudding user response:
base R
get_first_digit_below <-
function(x){
str <- substr(x, 3, nchar(x))
idx <- regexpr("[0-4]", str)
idx[idx < 0] <- nchar(str)[idx < 0]
as.vector(idx)
}
get_first_digit_below(Data$Number)
#[1] 6 5 3 4 3 8
dplyr & stringr
library(stringr)
library(dplyr)
get_first_digit_below <-
function(x){
str <- substr(x, 3, nchar(x))
idx <- str_locate(str, "[0-4]")[, 1]
coalesce(idx, str_length(str))
}
get_first_digit_below(Data$Number)
#[1] 6 5 3 4 3 8
CodePudding user response:
A solution that avoids converting to characters.
fFirstDigit <- function(v, x) {
n <- -floor(log10(.Machine$double.eps))
m <- matrix(as.integer((rep(v*10^(n - ceiling(log10(v))), each = n)/10^((n - 1L):0))%), length(v), n, TRUE)
m[,n] <- 0L
max.col(m < x, "f")
}
Number <- c(0.998971282, 0.97871, 0.98121752874, 0.98921752874, 0.95171358, 0.99999999, 1 - .Machine$double.eps, 987654321)
fFirstDigit(Number, 5L)
#> [1] 6 5 3 4 3 9 16 6
CodePudding user response:
A base R approach using strsplit
.
cbind(
Data, Position = sapply(strsplit(as.character(Data$Number), ""), function(x){
is <- as.numeric(x[3:length(x)]) < 5
ifelse(any(is), which(is)[1], length(x[3:length(x)])) })
)
Number Position
1 0.9989713 6
2 0.9787100 5
3 0.9812175 3
4 0.9892175 4
5 0.9517136 3
6 1.0000000 8
a dplyr
version
library(dplyr)
library(stringr)
Data %>%
rowwise() %>%
mutate(n = str_split(Number, ""),
n = list(n[3:length(n)]),
Position = which(sapply(n, "<", 5))[1],
Position = replace_na(Position, length(n)), n = NULL) %>%
ungroup()
# A tibble: 6 × 2
Number Position
<dbl> <int>
1 0.999 6
2 0.979 5
3 0.981 3
4 0.989 4
5 0.952 3
6 1.00 8
CodePudding user response:
Another approach using regexec
. The final sum adds the length of last match (zero or one [0-4]
digit) so that if there is such digit returns its position, otherwise returns the number of decimal numbers.
c("0.998971282", "0.97871", "0.98121752874",
"0.98921752874", "0.95171358","0.99999999") |>
regexec(pattern = "[0-9] \\.([5-9] )([0-4])?") |>
sapply(FUN= attr, which = "match.length") |>
(\(z) {z[2,] z[3,]})()
[1] 6 5 3 4 3 8