How to keep the changed value?-CodePudding

I met a problem when I am dealing with value change by using lead().

df <- data.frame(id = c(11,11,11,11,12,12,12,12,13,13,13,13),
                 drug_type = c("A","A","B","C","B","B","B","B","A","A","B","B"),
                 drug_brand = c(1,1,2,2,2,3,3,3,1,1,2,3),
                 date = c("2020-01-01","2020-02-01","2020-03-01","2020-03-13",
                          "2019-04-05","2019-05-02","2019-06-03","2019-08-04",
                          "2021-02-02","2021-02-27","2021-03-22","2021-04-11"))

As for the data frame shown above, I aim to identify the change in drug type for each patient, if there is no change in the drug type for a patient, then they will be ignored. To achieve it I used

df %>%
  group_by(id) %>%
  filter(lead(drug_type)=="B")

However, it only returned the corresponding rows whose next value is "B", while I want to also keep the rows containing changed values. For example, in terms of patient no.11, the result only returned me the record on the "2020-02-01", but I hope to find a way to keep the record on the "2020-03-01" as well, that is, the changed value.

I tried some ways, including creating extra columns named "next" showing the next values of each value in the drug type, but they are either wrong or excessively redundant. Therefore, I want to know if there is any better way to achieve my goal.

CodePudding user response：

You can check if drug type differs from lead() or if drug type differs from lag()

df %>%
  group_by(id) %>%
  filter(
    (drug_type!=lag(drug_type)) |
      drug_type!=lead(drug_type)
  )

Output:

     id drug_type drug_brand date      
  <dbl> <chr>          <dbl> <chr>     
1    11 A                  1 2020-02-01
2    11 B                  2 2020-03-01
3    11 C                  2 2020-03-13
4    13 A                  1 2021-02-27
5    13 B                  2 2021-03-22

If you only want rows that change to "B" plus the row that has that new "B" value, you could do something like this:

df %>% group_by(id) %>% 
  filter((drug_type!="B" & lead(drug_type)=="B") | drug_type=="B" & lag(drug_type)!="B")

Output:

     id drug_type drug_brand date      
  <dbl> <chr>          <dbl> <chr>     
1    11 A                  1 2020-02-01
2    11 B                  2 2020-03-01
3    13 A                  1 2021-02-27
4    13 B                  2 2021-03-22