count the different name considering as same-CodePudding

I want to count the number of fluctuation of responses under the column response per id. However, responses No~ no ~ DK. I need to consider as the same response just only for count to the number of fluctuate in response. I don't change responses permanently.

df <- data.frame(
  id=c(1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4), 
  response=c("Yes","Yes","No","DK","no","No","No","no","No","Yes","Yes","DK","No","Yes","Yes","No","No","No","died","TO","Yes","No","Yes")
)

I am trying it using the following code:

library(tidyverse)
df <- df %>%
  group_by(id) %>% fill(response) %>% 
  mutate(new = rleid(response), rn = row_number()) %>%  
  mutate(flactuation = case_when(rn >2 & duplicated(new) ~ 'No', rn > 2 ~ 'Yes')) %>% 
  mutate(numberofchange =  sum(flactuation=="Yes", na.rm = T)) %>% select(-rn, -flactuation)

Expected

     id response   new numberofchange
   <dbl> <chr>    <int>          <int>
 1     1 Yes          1              1
 2     1 Yes          1              1
 3     1 No           2              1
 4     1 DK           2              1
 5     1 no           2              1
 6     2 No           1              1
 7     2 No           1              1
 8     2 no           1              1
 9     2 No           1              1
10     2 Yes          2              1
11     2 Yes          2              1
12     3 DK           1              2
13     3 No           1              2
14     3 Yes          2              2
15     3 Yes          2              2
16     3 No           3              2
17     3 No           3              2
18     4 No           1              5
19     4 died         2              5
20     4 TO           3              5
21     4 Yes          4              5
22     4 No           5              5
23     4 Yes          6              5

CodePudding user response：

You could use data.table::rleid() to get the run-length indices.

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(new = data.table::rleid(replace(response, response %in% c('no', 'DK'), "No")),
         numberofchange = max(new) - 1) %>%
  ungroup()

# A tibble: 23 × 4
      id response   new numberofchange
   <dbl> <chr>    <int>          <dbl>
 1     1 Yes          1              1
 2     1 Yes          1              1
 3     1 No           2              1
 4     1 DK           2              1
 5     1 no           2              1
 6     2 No           1              1
 7     2 No           1              1
 8     2 no           1              1
 9     2 No           1              1
10     2 Yes          2              1
11     2 Yes          2              1
12     3 DK           1              2
13     3 No           1              2
14     3 Yes          2              2
15     3 Yes          2              2
16     3 No           3              2
17     3 No           3              2
18     4 No           1              5
19     4 died         2              5
20     4 TO           3              5
21     4 Yes          4              5
22     4 No           5              5
23     4 Yes          6              5