Home > Blockchain >  Leading and lagging numbers in R
Leading and lagging numbers in R

Time:06-24

In my new dataset, I would like each observation for New to continue until a new number occurs. I would like this to happen by group.

DF1:

Date Group New
2021-04-20 1001 0
2021-04-21 1001 0
2021-04-22 1001 9
2021-04-23 1001 0
2021-04-24 1001 0
2021-04-25 1001 12
2021-04-26 1001 0
2021-04-27 1001 0
2021-04-28 1001 0
2021-04-20 1002 0
2021-04-22 1002 1
2021-04-23 1002 0
2021-04-24 1002 0
2021-04-25 1002 3
2021-04-26 1002 0

Dfdesired:

Date Group New
2021-04-20 1001 0
2021-04-21 1001 0
2021-04-22 1001 9
2021-04-23 1001 9
2021-04-24 1001 9
2021-04-25 1001 12
2021-04-26 1001 12
2021-04-27 1001 12
2021-04-28 1001 12
2021-04-20 1002 0
2021-04-22 1002 1
2021-04-23 1002 1
2021-04-24 1002 1
2021-04-25 1002 3
2021-04-26 1002 3

CodePudding user response:

An option is to replace the 0 with NA, then use fill

library(dplyr)
library(tidyr)
DF1 %>%
   mutate(New = na_if(New, 0)) %>%
   group_by(Group) %>% 
   fill(New) %>%
   ungroup %>%
   mutate(New = replace_na(New, 0))

-output

# A tibble: 15 × 3
   Date       Group   New
   <chr>      <int> <int>
 1 2021-04-20  1001     0
 2 2021-04-21  1001     0
 3 2021-04-22  1001     9
 4 2021-04-23  1001     9
 5 2021-04-24  1001     9
 6 2021-04-25  1001    12
 7 2021-04-26  1001    12
 8 2021-04-27  1001    12
 9 2021-04-28  1001    12
10 2021-04-20  1002     0
11 2021-04-22  1002     1
12 2021-04-23  1002     1
13 2021-04-24  1002     1
14 2021-04-25  1002     3
15 2021-04-26  1002     3

If the values are already ordered, then can use cummax

DF1 %>%
   group_by(Group) %>% 
   mutate(New = cummax(New)) %>%
   ungroup

-output

# A tibble: 15 × 3
   Date       Group   New
   <chr>      <int> <int>
 1 2021-04-20  1001     0
 2 2021-04-21  1001     0
 3 2021-04-22  1001     9
 4 2021-04-23  1001     9
 5 2021-04-24  1001     9
 6 2021-04-25  1001    12
 7 2021-04-26  1001    12
 8 2021-04-27  1001    12
 9 2021-04-28  1001    12
10 2021-04-20  1002     0
11 2021-04-22  1002     1
12 2021-04-23  1002     1
13 2021-04-24  1002     1
14 2021-04-25  1002     3
15 2021-04-26  1002     3

data

DF1 <- structure(list(Date = c("2021-04-20", "2021-04-21", "2021-04-22", 
"2021-04-23", "2021-04-24", "2021-04-25", "2021-04-26", "2021-04-27", 
"2021-04-28", "2021-04-20", "2021-04-22", "2021-04-23", "2021-04-24", 
"2021-04-25", "2021-04-26"), Group = c(1001L, 1001L, 1001L, 1001L, 
1001L, 1001L, 1001L, 1001L, 1001L, 1002L, 1002L, 1002L, 1002L, 
1002L, 1002L), New = c(0L, 0L, 9L, 0L, 0L, 12L, 0L, 0L, 0L, 0L, 
1L, 0L, 0L, 3L, 0L)), class = "data.frame", row.names = c(NA, 
-15L))
  •  Tags:  
  • r
  • Related