Home > Net >  how to cumulative sum variable by unique values and input back in
how to cumulative sum variable by unique values and input back in

Time:01-20

I'm looking to do the following -- cumulative sum the indicator values and remove the indicators after those days original:

transaction day indicator
1 1 0
1 2 0
1 3 0
1 4 1
1 5 1
1 6 1
2 1 0
2 2 0
2 3 0
2 4 0
2 5 1
2 6 1

and make the new table like this --

transaction day indicator
1 1 0
1 2 0
1 3 0
1 4 3
2 1 0
2 2 0
2 3 0
2 4 0
2 5 2

CodePudding user response:

Change all day with indicator == 1 to the first day with indicator == 1

df%>%
  group_by(transaction)%>%
  mutate(day=case_when(indicator==0~day,
                       T~head(day[indicator==1],1)))%>%
  group_by(transaction,day)%>%
  summarise(indicator=sum(indicator))%>%
  ungroup

  transaction   day indicator
        <int> <int>     <int>
1           1     1         0
2           1     2         0
3           1     3         0
4           1     4         3
5           2     1         0
6           2     2         0
7           2     3         0
8           2     4         0
9           2     5         2

CodePudding user response:

Please try the below code

code

df <- bind_rows(df1, df2) %>% group_by(transaction) %>% 
mutate(cumsum=cumsum(indicator), cumsum2=ifelse(cumsum==1, day, NA)) %>% 
fill(cumsum2) %>% 
mutate(day=ifelse(!is.na(cumsum2), cumsum2, day)) %>% 
group_by(transaction, day) %>% slice_tail(n=1) %>% select(-cumsum2)

Created on 2023-01-19 with reprex v2.0.2

output

# A tibble: 8 × 4
# Groups:   transaction, day [8]
  transaction   day indicator cumsum
        <dbl> <int>     <dbl>  <dbl>
1           1     1         0      0
2           1     2         0      0
3           1     3         0      0
4           1     4         1      3
5           2     1         0      0
6           2     2         0      0
7           2     3         0      0
8           2     4         1      2

  • Related