I have a dataset that looks like this:
I make another column showing the number of days in this “approved” date range. (Rounded, & as numeric so I can apply other calculations to it)
dat1$days_approved <- round(as.numeric(difftime(dat1$end_date, dat1$start_date, units=c("days"))), digits = 0)
Now, I want to see where I am based on today’s date regarding these time periods. That is, are we 1/2way through, have not started, or are we complete?
So, I use the tzone function for “today” and apply some basic division.
dat1$time_progress <- (round(as.numeric(now(tzone = "") - dat1$start_date, units=c("days"))))/dat1$days_approved
That leaves me with a dataset looking like this:
This makes me think I need to set a threshold, if the value is greater than 1, I’d like it to return 1. If it is less than 1, I’d like to return the value.
I can make this work with an if else statement…
ifelse(dat1$time_progress > 1, 1, dat1$time_progress)
However, I’m struggling to apply it as logic to the column. Is there an existing function that can apply a threshold I have not found?
CodePudding user response:
We could create our own treshold function and then apply it to the desired column:
library(dplyr)
library(lubridate)
my_treshold_function <- function(x){
ifelse(x >1, 1, x)
}
df %>%
mutate(across(ends_with("date"), ymd_hms),
days_approved = round(as.numeric(end_date-start_date), 0),
progress = round(as.numeric(now(tzone = "")-start_date))/days_approved,
across(progress, ~my_treshold_function(.), .names="treshold"))
start_date end_date days_approved progress treshold
<dttm> <dttm> <dbl> <dbl> <dbl>
1 2021-11-28 05:00:00 2022-05-29 04:00:00 182 1.18 1
2 2021-09-03 04:00:00 2022-03-04 05:00:00 182 1.65 1
3 2021-02-22 05:00:00 2021-03-16 04:00:00 22 22.5 1
4 2020-09-18 04:00:00 2021-03-19 04:00:00 182 3.58 1
5 2020-01-06 05:00:00 2020-07-05 04:00:00 181 5.01 1
6 2021-09-18 04:00:00 2022-03-18 04:00:00 181 1.58 1
7 2020-07-02 04:00:00 2020-08-30 04:00:00 59 12.4 1
8 2021-03-30 04:00:00 2021-04-27 04:00:00 28 16.4 1
9 2021-05-31 04:00:00 2021-11-30 05:00:00 183 2.16 1
10 2021-08-05 04:00:00 2022-02-03 05:00:00 182 1.81 1
data:
structure(list(start_date = c("2021-11-28 5:00:00", "2021-09-03 4:00:00",
"2021-02-22 5:00:00", "2020-09-18 4:00:00", "2020-01-06 5:00:00",
"2021-09-18 4:00:00", "2020-07-02 4:00:00", "2021-03-30 4:00:00",
"2021-05-31 4:00:00", "2021-08-05 4:00:00"), end_date = c("2022-05-29 4:00:00",
"2022-03-04 5:00:00", "2021-03-16 4:00:00", "2021-03-19 4:00:00",
"2020-07-05 4:00:00", "2022-03-18 4:00:00", "2020-08-30 4:00:00",
"2021-04-27 4:00:00", "2021-11-30 5:00:00", "2022-02-03 5:00:00"
)), class = "data.frame", row.names = c(NA, -10L))
CodePudding user response:
dat2<- dat1
dat2$time_progress2 <- ifelse(dat1$time_progress > 1, 1, dat1$time_progress)
Have you tried this? It should create a new dataframe which has a new column with the same columns as dat1 plus a new column added which is the time_progress variable that is adjusted with the threshold. You can compare them side-by-side.
Then, if everything checks out, you can just delete the original time_progress variable from dat2.
Judging from the data you provided in your question, it seems like nearly every row will return a 1.
You could also create a new column using the same function as above to return 3 values "day completed", "half completed", and "not started" as that is what you are looking for.