Home > database >  Write a value from each row to every row x days ahead
Write a value from each row to every row x days ahead

Time:11-09

I have the following following dataset:

data.frame(id=c(1,1,1,1,1,2,2,2,2), 
                   date = as.Date(c("2020-01-01","2020-01-04","2020-01-06","2020-01-07","2020-01-10","2020-01-01","2020-01-02","2020-01-04","2020-01-05")),
                   duration = c(2,3,4,2,4,3,4,2,2),
                   product = c("A","B","C","A","C","B","C","A","A"))

I have an id of people, what product they used on each day and also how long will the product last them (duration) - update: products in this sample do have a set duration, but in reality it does not to need to be the case.

I need to make a list of products that are currently in use by each person for each row, so the resulting dataset should look as such (separator is "|" here, but does not matter):

data.frame(id=c(1,1,1,1,1,2,2,2,2), 
           date = as.Date(c("2020-01-01","2020-01-04","2020-01-06","2020-01-07","2020-01-10","2020-01-01","2020-01-02","2020-01-04","2020-01-05")),
           duration = c(2,3,4,2,4,3,4,2,2),
           product = c("A","B","C","A","C","B","C","A","A"),
           products_in_use = c("A","B","B | C", "A | B | C", "C", "B", "B | C", "A | B | C", "A | C"))

Basically I am thinking I would need to take all rows that are within duration days (as in less or equal days) from the current row and append to their list the current product. Then I would take a unique and ordered version of the list and write is as a string. But I dont know how to do the first step.

It would be preferred if all this would work inside of an dplyr pipe.

CodePudding user response:

I can't see an easy way of doing this entirely within dplyr because it relies on checking the sum of the date and the duration in each row with the date in every row, but if you first define this function:

get_products_in_use <- function(dates, durations, products)
{
  apply(sapply(seq_along(dates), 
         function(i) {
           ifelse(test = dates >= dates[i] & dates <= dates[i]   durations[i], 
                  yes  = products[i], 
                  no   = "")
           }),
      1, function(x) paste(unique(sort(x[nzchar(x)])), collapse = " | "))
}

Then it's very easy to use within a dplyr pipe:

testdata %>% 
  group_by(id) %>% 
  mutate(products_in_use = get_products_in_use(date, duration, product))
#> # A tibble: 9 x 5
#> # Groups:   id [2]
#>      id date       duration product products_in_use
#>   <dbl> <date>        <dbl> <chr>   <chr>          
#> 1     1 2020-01-01        2 A       A              
#> 2     1 2020-01-04        3 B       B              
#> 3     1 2020-01-06        4 C       B | C          
#> 4     1 2020-01-07        2 A       A | B | C      
#> 5     1 2020-01-10        4 C       C              
#> 6     2 2020-01-01        3 B       B              
#> 7     2 2020-01-02        4 C       B | C          
#> 8     2 2020-01-04        2 A       A | B | C      
#> 9     2 2020-01-05        2 A       A | C

Created on 2021-11-09 by the reprex package (v2.0.0)

  •  Tags:  
  • r
  • Related