I want to count the number of fluctuation of responses under the column response
per id. However, responses No
~ no
~ DK
. I need to consider as the same response just only for count to the number of fluctuate in response. I don't change responses permanently.
df <- data.frame(
id=c(1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4),
response=c("Yes","Yes","No","DK","no","No","No","no","No","Yes","Yes","DK","No","Yes","Yes","No","No","No","died","TO","Yes","No","Yes")
)
I am trying it using the following code:
library(tidyverse)
df <- df %>%
group_by(id) %>% fill(response) %>%
mutate(new = rleid(response), rn = row_number()) %>%
mutate(flactuation = case_when(rn >2 & duplicated(new) ~ 'No', rn > 2 ~ 'Yes')) %>%
mutate(numberofchange = sum(flactuation=="Yes", na.rm = T)) %>% select(-rn, -flactuation)
Expected
id response new numberofchange
<dbl> <chr> <int> <int>
1 1 Yes 1 1
2 1 Yes 1 1
3 1 No 2 1
4 1 DK 2 1
5 1 no 2 1
6 2 No 1 1
7 2 No 1 1
8 2 no 1 1
9 2 No 1 1
10 2 Yes 2 1
11 2 Yes 2 1
12 3 DK 1 2
13 3 No 1 2
14 3 Yes 2 2
15 3 Yes 2 2
16 3 No 3 2
17 3 No 3 2
18 4 No 1 5
19 4 died 2 5
20 4 TO 3 5
21 4 Yes 4 5
22 4 No 5 5
23 4 Yes 6 5
CodePudding user response:
You could use data.table::rleid()
to get the run-length indices.
library(dplyr)
df %>%
group_by(id) %>%
mutate(new = data.table::rleid(replace(response, response %in% c('no', 'DK'), "No")),
numberofchange = max(new) - 1) %>%
ungroup()
# A tibble: 23 × 4
id response new numberofchange
<dbl> <chr> <int> <dbl>
1 1 Yes 1 1
2 1 Yes 1 1
3 1 No 2 1
4 1 DK 2 1
5 1 no 2 1
6 2 No 1 1
7 2 No 1 1
8 2 no 1 1
9 2 No 1 1
10 2 Yes 2 1
11 2 Yes 2 1
12 3 DK 1 2
13 3 No 1 2
14 3 Yes 2 2
15 3 Yes 2 2
16 3 No 3 2
17 3 No 3 2
18 4 No 1 5
19 4 died 2 5
20 4 TO 3 5
21 4 Yes 4 5
22 4 No 5 5
23 4 Yes 6 5