I have a set of data organized into groups and sorted by time. I would like to create a new variable that takes the value of the prior row and writes it to the next row if the two rows are part of the same group.
Input data
date | group_id | ticket_num
1/1/22 | A | 12345A
1/1/22 | A | 54321A
1/3/22 | A | 56789A
1/1/22 | B | 98765A
1/2/22 | B | 01234A
Desired outcome
date | group_id | ticket_num | prior_ticket
1/1/22 | A | 12345A | NA
1/1/22 | A | 54321A | 12345A
1/3/22 | A | 56789A | 54321A
1/1/22 | B | 98765A | NA
1/2/22 | B | 01234A | 98765A
What I have tried so far
my_data %>% mutate(prior_ticket = lag(ticket_num)
This ends up putting tickets from the prior group into the next group, so not desirable
my_data %>% mutate(prior_ticket = case_when(group_id == lead(group_id) ~ lead(ticket_num), TRUE ~ NA
This results in an error: "Error in mutate(., prior_ticket=case_when(group_id == lead(group_id) ~ : x ..2
must be a vector, not a formula object.
CodePudding user response:
You can just use group_by
your ID and use lag
like this:
df <- read.table(text = "date|group_id|ticket_num
1/1/22|A|12345A
1/1/22|A|54321A
1/3/22|A|56789A
1/1/22|B|98765A
1/2/22|B|01234A", header = TRUE, sep ="|")
library(dplyr)
df %>%
group_by(group_id) %>%
mutate(prior_ticket = lag(ticket_num))
#> # A tibble: 5 × 4
#> # Groups: group_id [2]
#> date group_id ticket_num prior_ticket
#> <chr> <chr> <chr> <chr>
#> 1 1/1/22 A 12345A <NA>
#> 2 1/1/22 A 54321A 12345A
#> 3 1/3/22 A 56789A 54321A
#> 4 1/1/22 B 98765A <NA>
#> 5 1/2/22 B 01234A 98765A
Created on 2022-07-19 by the reprex package (v2.0.1)