Home > database >  How to keep the changed value?
How to keep the changed value?

Time:04-25

I met a problem when I am dealing with value change by using lead().

df <- data.frame(id = c(11,11,11,11,12,12,12,12,13,13,13,13),
                 drug_type = c("A","A","B","C","B","B","B","B","A","A","B","B"),
                 drug_brand = c(1,1,2,2,2,3,3,3,1,1,2,3),
                 date = c("2020-01-01","2020-02-01","2020-03-01","2020-03-13",
                          "2019-04-05","2019-05-02","2019-06-03","2019-08-04",
                          "2021-02-02","2021-02-27","2021-03-22","2021-04-11"))

As for the data frame shown above, I aim to identify the change in drug type for each patient, if there is no change in the drug type for a patient, then they will be ignored. To achieve it I used

df %>%
  group_by(id) %>%
  filter(lead(drug_type)=="B")

However, it only returned the corresponding rows whose next value is "B", while I want to also keep the rows containing changed values. For example, in terms of patient no.11, the result only returned me the record on the "2020-02-01", but I hope to find a way to keep the record on the "2020-03-01" as well, that is, the changed value.

I tried some ways, including creating extra columns named "next" showing the next values of each value in the drug type, but they are either wrong or excessively redundant. Therefore, I want to know if there is any better way to achieve my goal.

CodePudding user response:

You can check if drug type differs from lead() or if drug type differs from lag()

df %>%
  group_by(id) %>%
  filter(
    (drug_type!=lag(drug_type)) |
      drug_type!=lead(drug_type)
  )

Output:

     id drug_type drug_brand date      
  <dbl> <chr>          <dbl> <chr>     
1    11 A                  1 2020-02-01
2    11 B                  2 2020-03-01
3    11 C                  2 2020-03-13
4    13 A                  1 2021-02-27
5    13 B                  2 2021-03-22

If you only want rows that change to "B" plus the row that has that new "B" value, you could do something like this:

df %>% group_by(id) %>% 
  filter((drug_type!="B" & lead(drug_type)=="B") | drug_type=="B" & lag(drug_type)!="B")

Output:

     id drug_type drug_brand date      
  <dbl> <chr>          <dbl> <chr>     
1    11 A                  1 2020-02-01
2    11 B                  2 2020-03-01
3    13 A                  1 2021-02-27
4    13 B                  2 2021-03-22
  • Related