Home > database >  How to create groups based on a changin condition using for in R?
How to create groups based on a changin condition using for in R?

Time:10-19

I have a data frame and a vector that I want to compare with a column of my data frame to assign groups based on the values that meet the condition, the problem is that these values are dynamic so I need a code that takes into account the different lengths that this vector can take

This is a minimal reproducible example of my data frame

value <- c(rnorm(39, 5, 2))
Date <- seq(as.POSIXct('2021-01-18'), as.POSIXct('2021-10-15'), by = "7 days")

df <- data.frame(Date, value)

This is the vector I have to compare with the Date of the data frame

dates_tour <- as.POSIXct(c('2021-01-18', '2021-05-18', '2021-08-18', '2021-10-15'))

This creates the desired output

df <- df %>% mutate(tour = case_when(Date >= dates_tour[1] & Date <= dates_tour[2] ~ 1,
                                                       Date > dates_tour[2] & Date <= dates_tour[3]~2,
                                                       Date > dates_tour[3] & Date <= dates_tour[4]~3))

However, I don't want to do it like that since this project needs to be updated frequently and the variable dates_tour change in length

So I would like to take that into account to create the tour variable

I tried to do it like this: but it doesn't work

for (i in 1:length(dates_tour)) {
  df <- df %>% mutate(tour = case_when(Date >= dates_tour[i] & Date <= dates_tour[i 1] ~ i))
}

CodePudding user response:

You can use cut to bin a vector based on break points:

df %>%
  mutate(
    tour = cut(Date, breaks = dates_tour, labels = seq_along(dates_tour[-1]))
  )

CodePudding user response:

We may remove the first and last elements to create a tibble and then loop over the rows of the tibble

library(dplyr)
library(purrr)
keydat <- tibble(start = dates_tour[-length(dates_tour)],
      end = dates_tour[-1])
 
df$tour <- imap(seq_len(nrow(keydat)),
    ~ case_when(df$Date >= keydat$start[.x] & 
     df$Date <= keydat$end[.x]~ .y )) %>% 
   invoke(coalesce, .) 

-output

> df
                  Date    value tour
1  2021-01-18 00:00:00 7.874620    1
2  2021-01-25 00:00:00 9.704973    1
3  2021-02-01 00:00:00 5.898070    1
4  2021-02-08 00:00:00 3.287319    1
5  2021-02-15 00:00:00 5.488132    1
6  2021-02-22 00:00:00 4.425636    1
7  2021-03-01 00:00:00 6.244084    1
8  2021-03-08 00:00:00 5.528364    1
9  2021-03-15 01:00:00 7.954929    1
10 2021-03-22 01:00:00 4.691995    1
11 2021-03-29 01:00:00 5.943415    1
12 2021-04-05 01:00:00 5.316373    1
13 2021-04-12 01:00:00 5.182952    1
14 2021-04-19 01:00:00 3.330700    1
15 2021-04-26 01:00:00 7.461089    1
16 2021-05-03 01:00:00 4.338873    1
17 2021-05-10 01:00:00 5.768665    1
18 2021-05-17 01:00:00 3.574488    1
19 2021-05-24 01:00:00 5.106042    2
20 2021-05-31 01:00:00 2.828844    2
21 2021-06-07 01:00:00 4.616084    2
22 2021-06-14 01:00:00 7.234506    2
23 2021-06-21 01:00:00 4.760413    2
24 2021-06-28 01:00:00 7.020543    2
25 2021-07-05 01:00:00 7.403235    2
26 2021-07-12 01:00:00 6.368435    2
27 2021-07-19 01:00:00 3.527764    2
28 2021-07-26 01:00:00 5.254025    2
29 2021-08-02 01:00:00 5.676425    2
30 2021-08-09 01:00:00 3.783304    2
31 2021-08-16 01:00:00 6.310292    2
32 2021-08-23 01:00:00 2.938218    3
33 2021-08-30 01:00:00 5.101852    3
34 2021-09-06 01:00:00 3.765659    3
35 2021-09-13 01:00:00 5.489846    3
36 2021-09-20 01:00:00 4.174276    3
37 2021-09-27 01:00:00 7.348895    3
38 2021-10-04 01:00:00 5.103772    3
39 2021-10-11 01:00:00 4.941248    3
  • Related