Home > other >  count rows until value changes then continue counting for the next value in dplyr
count rows until value changes then continue counting for the next value in dplyr

Time:03-04

I want to do a survival probability in R using the run times of a cell is in a specific state (1, 2, or 3). So, I am trying to count the number of rows that a cell stays in a specific state in different IDs. I just don't want the total count of states, but I want the number of times it occurs before it changes over the course of time.

This is a representative dataframe:

structure(list(id = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"
), time = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", 
"11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21"
), state = c("1", "3", "3", "3", "2", "2", "2", "3", "3", "3", 
"2", "2", "2", "1", "1", "3", "2", "1", "3", "3", "3")), class = "data.frame", row.names = c(NA, 
-21L))
  id time state
1  A    1     1
2  A    2     3
3  A    3     3
4  A    4     3
5  A    5     2
6  A    6     2
 
df2 <- df %>%
  group_by (id) %>%
  summarise (statechange = count(state))

  id    statechange$x $freq      
1 A     1                 3
2 A     2                 6
3 A     3                 6
4 B     1                 1
5 B     2                 1
6 B     3                 4

Ideally, the result should be:

id  from  statechange   freq
A   NA  1       1
A   1   3       3
A   3   2       3
A   2   3       3
A   3   2       3
A   2   1       2
B   NA  3       1
B   3   2       1
B   2   1           1
B   1   2       3

where from is the original state before it changed and statechange is the ones that are being counted.

I don't know if I should use an ifelse statement for counts, or if there is another way to achieve this. Let me know if anything is not clear and I greatly appreciate help!

edit: changed the question to include the original state in the resulting df.

CodePudding user response:

You can use rle to get your result. However, vased on your example data I get a different result than what you posted as expected output.

library(tidyverse)
df %>%
  group_by(id) %>%
  summarize(statechange = as.numeric(rle(state)[[2]]),
            freq = rle(state)[[1]]) %>%
  ungroup()

# A tibble: 10 x 3
   id    statechange  freq
   <chr>       <dbl> <int>
 1 A               1     1
 2 A               3     3
 3 A               2     3
 4 A               3     3
 5 A               2     3
 6 A               1     2
 7 B               3     1
 8 B               2     1
 9 B               1     1
10 B               3     3
  • Related