I have a dataset I am working on where I have a list of IDs and then what state each ID is in (example below)
d <- data.frame(
ID = c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
State = c(1,2,2,2,1,1,3,1,2,2,3,1,3,3,3,3,3,3,1,2,2,1,3,3,3,1)
)
And what I am looking for is to create a new data frame where I have the state changes listed and counted for each ID. For example, the output for this code is...
ID 1->1 1->2 1->3 2->1 2->2 2->3 3->1 3->2 3->3
1 1 2 1 1 3 1 2 0 0
2 0 1 1 1 1 0 2 0 7
I read a lot of examples of people using code like sum(diff(sign(X)) != 0)
for summing any change in the code but I want it to specifically be for each state change.
Thank you!
CodePudding user response:
library(dplyr)
d %>%
group_by(ID) %>%
mutate(change = paste(lag(State), State, sep = "->")) %>%
slice(-1) %>%
with(table(ID, change))
# change
# ID 1->1 1->2 1->3 2->1 2->2 2->3 3->1 3->3
# 1 1 2 1 1 3 1 2 0
# 2 0 1 1 1 1 0 2 7
If you need to fill in the possible but unobserved changes, you could do it like this:
all_states = unique(d$State)
all_changes = sort(outer(all_states, all_states, paste, sep = "->"))
d %>%
group_by(ID) %>%
mutate(change = factor(paste(lag(State), State, sep = "->"), levels = all_changes)) %>%
slice(-1) %>%
with(table(ID, change))
# change
# ID 1->1 1->2 1->3 2->1 2->2 2->3 3->1 3->2 3->3
# 1 1 2 1 1 3 1 2 0 0
# 2 0 1 1 1 1 0 2 0 7