Return position of first number below 5 in values with decimal places-CodePudding

I would like to know the position of where the first number in a decimal place is less than 5. If this is not possible (where all numbers are 5 or over) then however many decimal places the number has should be returned instead.

So this data:

library(dplyr)
Data <- tibble(Number = c(0.998971282, 0.97871, 0.98121752874, 0.98921752874, 0.95171358,0.99999999))

Should produce an output like this:

Data %>% mutate(Position = c(6, 5, 3, 4, 3, 8))

CodePudding user response：

base R

get_first_digit_below <- 
  function(x){
    str <- substr(x, 3, nchar(x))
    idx <- regexpr("[0-4]", str)
    idx[idx < 0] <- nchar(str)[idx < 0]
    as.vector(idx)
  }

get_first_digit_below(Data$Number)
#[1] 6 5 3 4 3 8

dplyr & stringr

library(stringr)
library(dplyr)
get_first_digit_below <- 
  function(x){
    str <- substr(x, 3, nchar(x))
    idx <- str_locate(str, "[0-4]")[, 1]
    coalesce(idx, str_length(str))
  }

get_first_digit_below(Data$Number)
#[1] 6 5 3 4 3 8

CodePudding user response：

A solution that avoids converting to characters.

fFirstDigit <- function(v, x) {
  n <- -floor(log10(.Machine$double.eps))
  m <- matrix(as.integer((rep(v*10^(n - ceiling(log10(v))), each = n)/10^((n - 1L):0))%), length(v), n, TRUE)
  m[,n] <- 0L
  max.col(m < x, "f")
}

Number <- c(0.998971282, 0.97871, 0.98121752874, 0.98921752874, 0.95171358, 0.99999999, 1 - .Machine$double.eps, 987654321)
fFirstDigit(Number, 5L)
#> [1]  6  5  3  4  3  9 16  6

CodePudding user response：

A base R approach using strsplit.

cbind(
  Data, Position = sapply(strsplit(as.character(Data$Number), ""), function(x){ 
    is <- as.numeric(x[3:length(x)]) < 5
    ifelse(any(is), which(is)[1], length(x[3:length(x)])) })
)
     Number Position
1 0.9989713        6
2 0.9787100        5
3 0.9812175        3
4 0.9892175        4
5 0.9517136        3
6 1.0000000        8

a dplyr version

library(dplyr)
library(stringr)

Data %>% 
  rowwise() %>% 
  mutate(n = str_split(Number, ""), 
         n = list(n[3:length(n)]), 
         Position = which(sapply(n, "<", 5))[1],
         Position = replace_na(Position, length(n)), n = NULL) %>%
  ungroup()
# A tibble: 6 × 2
  Number Position
   <dbl>    <int>
1  0.999        6
2  0.979        5
3  0.981        3
4  0.989        4
5  0.952        3
6  1.00         8

CodePudding user response：

Another approach using regexec. The final sum adds the length of last match (zero or one [0-4] digit) so that if there is such digit returns its position, otherwise returns the number of decimal numbers.

c("0.998971282", "0.97871", "0.98121752874", 
  "0.98921752874", "0.95171358","0.99999999") |>
    regexec(pattern = "[0-9] \\.([5-9] )([0-4])?") |>
    sapply(FUN= attr, which = "match.length") |>
    (\(z) {z[2,]   z[3,]})()


[1] 6 5 3 4 3 8