I want to do a survival probability in R using the run times of a cell is in a specific state (1, 2, or 3). So, I am trying to count the number of rows that a cell stays in a specific state in different IDs. I just don't want the total count of states, but I want the number of times it occurs before it changes over the course of time.
This is a representative dataframe:
structure(list(id = c("A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"
), time = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21"
), state = c("1", "3", "3", "3", "2", "2", "2", "3", "3", "3",
"2", "2", "2", "1", "1", "3", "2", "1", "3", "3", "3")), class = "data.frame", row.names = c(NA,
-21L))
id time state
1 A 1 1
2 A 2 3
3 A 3 3
4 A 4 3
5 A 5 2
6 A 6 2
df2 <- df %>%
group_by (id) %>%
summarise (statechange = count(state))
id statechange$x $freq
1 A 1 3
2 A 2 6
3 A 3 6
4 B 1 1
5 B 2 1
6 B 3 4
Ideally, the result should be:
id from statechange freq
A NA 1 1
A 1 3 3
A 3 2 3
A 2 3 3
A 3 2 3
A 2 1 2
B NA 3 1
B 3 2 1
B 2 1 1
B 1 2 3
where from is the original state before it changed and statechange is the ones that are being counted.
I don't know if I should use an ifelse statement for counts, or if there is another way to achieve this. Let me know if anything is not clear and I greatly appreciate help!
edit: changed the question to include the original state in the resulting df.
CodePudding user response:
You can use rle to get your result. However, vased on your example data I get a different result than what you posted as expected output.
library(tidyverse)
df %>%
group_by(id) %>%
summarize(statechange = as.numeric(rle(state)[[2]]),
freq = rle(state)[[1]]) %>%
ungroup()
# A tibble: 10 x 3
id statechange freq
<chr> <dbl> <int>
1 A 1 1
2 A 3 3
3 A 2 3
4 A 3 3
5 A 2 3
6 A 1 2
7 B 3 1
8 B 2 1
9 B 1 1
10 B 3 3