Home > Enterprise >  Case_when within mutate to check if value is the same as lead
Case_when within mutate to check if value is the same as lead

Time:07-20

I have a set of data organized into groups and sorted by time. I would like to create a new variable that takes the value of the prior row and writes it to the next row if the two rows are part of the same group.

Input data

date   | group_id | ticket_num
1/1/22 | A        | 12345A
1/1/22 | A        | 54321A
1/3/22 | A        | 56789A
1/1/22 | B        | 98765A
1/2/22 | B        | 01234A

Desired outcome

date   | group_id | ticket_num | prior_ticket
1/1/22 | A        | 12345A     | NA
1/1/22 | A        | 54321A     | 12345A
1/3/22 | A        | 56789A     | 54321A
1/1/22 | B        | 98765A     | NA
1/2/22 | B        | 01234A     | 98765A

What I have tried so far

my_data %>% mutate(prior_ticket = lag(ticket_num) 

This ends up putting tickets from the prior group into the next group, so not desirable

my_data %>% mutate(prior_ticket = case_when(group_id == lead(group_id) ~ lead(ticket_num), TRUE ~ NA

This results in an error: "Error in mutate(., prior_ticket=case_when(group_id == lead(group_id) ~ : x ..2 must be a vector, not a formula object.

CodePudding user response:

You can just use group_by your ID and use lag like this:

df <- read.table(text = "date|group_id|ticket_num
1/1/22|A|12345A
1/1/22|A|54321A
1/3/22|A|56789A
1/1/22|B|98765A
1/2/22|B|01234A", header = TRUE, sep ="|")

library(dplyr)
df %>%
  group_by(group_id) %>%
  mutate(prior_ticket = lag(ticket_num))
#> # A tibble: 5 × 4
#> # Groups:   group_id [2]
#>   date   group_id ticket_num prior_ticket
#>   <chr>  <chr>    <chr>      <chr>       
#> 1 1/1/22 A        12345A     <NA>        
#> 2 1/1/22 A        54321A     12345A      
#> 3 1/3/22 A        56789A     54321A      
#> 4 1/1/22 B        98765A     <NA>        
#> 5 1/2/22 B        01234A     98765A

Created on 2022-07-19 by the reprex package (v2.0.1)

  • Related