Home > database >  Create new column based on cummulative/rolling values in grouping column
Create new column based on cummulative/rolling values in grouping column

Time:12-02

Edit: Unfortunately, I simplified my needs and data too much. I will update the question below.

I have a df similar to the code below. I need to create a new column called first_funding_date that is equal to the value of fund.date, where sigma==0, until the next time sigma==0. In the example df below, first_fund_date should be a vector with the first observation equal to "2019/05/22", the following 3 observations equal to "2020/09/05", and the final 4 equal to "2019/11/30".


set.seed(111)
df <- data.frame(id = c(1,1,3,4,5,6,2,7),
                 fund.date = sample(seq(as.Date('2018/01/01'),
                                         as.Date('2021/01/01'), by="day"), 8),
                 sigma = c(0,0,1,2,0,1,2,3))

%>% mutate(first_fund_date = ??? )

I also need to create a column called last_funding_date that is equal to fund.date, for the rolling max of sigma. The first 4 observations should be "2020/03/03" and the last 4 should be "2020/12/04".

CodePudding user response:

library(dplyr)
df %>% 
  mutate(first_fund_date = fund.date[sigma==0],
         last_funding_date = fund.date[sigma==max(sigma)])
  id  fund.date sigma first_fund_date last_funding_date
1  1 2019-05-22     1      2020-09-05        2018-03-10
2  2 2020-09-05     0      2020-09-05        2018-03-10
3  3 2018-06-24     1      2020-09-05        2018-03-10
4  4 2020-03-03     2      2020-09-05        2018-03-10
5  5 2019-11-30     3      2020-09-05        2018-03-10
6  6 2018-03-10     4      2020-09-05        2018-03-10

CodePudding user response:

The key here is to to create index variables to group_by with cumsum(sigma==0) and cumsum(sigma < lag(sigma)).

library(dplyr)

df %>%
    group_by(index = cumsum(sigma==0))%>%
    mutate(first_fund.date = first(fund.date))%>%
    group_by(index_2 = cumsum(sigma < lag(sigma, default = Inf)))%>%
    mutate(last_fund.date = last(fund.date))%>%
    ungroup()%>%
    select(-contains('index'))

# A tibble: 8 × 5
     id fund.date  sigma first_fund.date last_fund.date
  <dbl> <date>     <dbl> <date>          <date>        
1     1 2019-05-22     0 2019-05-22      2020-03-03    
2     1 2020-09-05     0 2020-09-05      2020-03-03    
3     3 2018-06-24     1 2020-09-05      2020-03-03    
4     4 2020-03-03     2 2020-09-05      2020-03-03    
5     5 2019-11-30     0 2019-11-30      2020-12-04    
6     6 2018-03-10     1 2019-11-30      2020-12-04    
7     2 2018-11-01     2 2019-11-30      2020-12-04    
8     7 2020-12-04     3 2019-11-30      2020-12-04 
  • Related