Home > database >  Is there an R function for applying a threshold?
Is there an R function for applying a threshold?

Time:07-02

I have a dataset that looks like this:

Dataset

I make another column showing the number of days in this “approved” date range. (Rounded, & as numeric so I can apply other calculations to it)

dat1$days_approved <- round(as.numeric(difftime(dat1$end_date, dat1$start_date, units=c("days"))), digits = 0)

Now, I want to see where I am based on today’s date regarding these time periods. That is, are we 1/2way through, have not started, or are we complete?

So, I use the tzone function for “today” and apply some basic division.

dat1$time_progress <- (round(as.numeric(now(tzone = "") - dat1$start_date, units=c("days"))))/dat1$days_approved

That leaves me with a dataset looking like this:

Dataset revised

This makes me think I need to set a threshold, if the value is greater than 1, I’d like it to return 1. If it is less than 1, I’d like to return the value.

I can make this work with an if else statement…

ifelse(dat1$time_progress > 1, 1, dat1$time_progress)

However, I’m struggling to apply it as logic to the column. Is there an existing function that can apply a threshold I have not found?

CodePudding user response:

We could create our own treshold function and then apply it to the desired column:

library(dplyr)
library(lubridate)

my_treshold_function <- function(x){
  ifelse(x >1, 1, x)
}

df %>% 
  mutate(across(ends_with("date"), ymd_hms),
         days_approved = round(as.numeric(end_date-start_date), 0),
         progress = round(as.numeric(now(tzone = "")-start_date))/days_approved,
         across(progress, ~my_treshold_function(.), .names="treshold"))

   start_date          end_date            days_approved progress treshold
   <dttm>              <dttm>                      <dbl>    <dbl>    <dbl>
 1 2021-11-28 05:00:00 2022-05-29 04:00:00           182     1.18        1
 2 2021-09-03 04:00:00 2022-03-04 05:00:00           182     1.65        1
 3 2021-02-22 05:00:00 2021-03-16 04:00:00            22    22.5         1
 4 2020-09-18 04:00:00 2021-03-19 04:00:00           182     3.58        1
 5 2020-01-06 05:00:00 2020-07-05 04:00:00           181     5.01        1
 6 2021-09-18 04:00:00 2022-03-18 04:00:00           181     1.58        1
 7 2020-07-02 04:00:00 2020-08-30 04:00:00            59    12.4         1
 8 2021-03-30 04:00:00 2021-04-27 04:00:00            28    16.4         1
 9 2021-05-31 04:00:00 2021-11-30 05:00:00           183     2.16        1
10 2021-08-05 04:00:00 2022-02-03 05:00:00           182     1.81        1

data:

structure(list(start_date = c("2021-11-28 5:00:00", "2021-09-03 4:00:00", 
"2021-02-22 5:00:00", "2020-09-18 4:00:00", "2020-01-06 5:00:00", 
"2021-09-18 4:00:00", "2020-07-02 4:00:00", "2021-03-30 4:00:00", 
"2021-05-31 4:00:00", "2021-08-05 4:00:00"), end_date = c("2022-05-29 4:00:00", 
"2022-03-04 5:00:00", "2021-03-16 4:00:00", "2021-03-19 4:00:00", 
"2020-07-05 4:00:00", "2022-03-18 4:00:00", "2020-08-30 4:00:00", 
"2021-04-27 4:00:00", "2021-11-30 5:00:00", "2022-02-03 5:00:00"
)), class = "data.frame", row.names = c(NA, -10L))

CodePudding user response:

dat2<- dat1
dat2$time_progress2 <- ifelse(dat1$time_progress > 1, 1, dat1$time_progress)

Have you tried this? It should create a new dataframe which has a new column with the same columns as dat1 plus a new column added which is the time_progress variable that is adjusted with the threshold. You can compare them side-by-side.

Then, if everything checks out, you can just delete the original time_progress variable from dat2.

Judging from the data you provided in your question, it seems like nearly every row will return a 1.

You could also create a new column using the same function as above to return 3 values "day completed", "half completed", and "not started" as that is what you are looking for.

  • Related