Home > Net >  Count Change in Column Value in R
Count Change in Column Value in R

Time:03-18

I have a dataset I am working on where I have a list of IDs and then what state each ID is in (example below)

  d <- data.frame(
    ID = c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
 State = c(1,2,2,2,1,1,3,1,2,2,3,1,3,3,3,3,3,3,1,2,2,1,3,3,3,1)
  )

And what I am looking for is to create a new data frame where I have the state changes listed and counted for each ID. For example, the output for this code is...

ID 1->1 1->2 1->3 2->1 2->2 2->3 3->1 3->2 3->3
1   1    2     1   1    3    1    2    0    0
2   0    1     1   1    1    0    2    0    7

I read a lot of examples of people using code like sum(diff(sign(X)) != 0) for summing any change in the code but I want it to specifically be for each state change.

Thank you!

CodePudding user response:

library(dplyr)

 d %>% 
   group_by(ID) %>%
   mutate(change = paste(lag(State), State, sep = "->")) %>%
   slice(-1) %>%
   with(table(ID, change))
#     change
# ID  1->1 1->2 1->3 2->1 2->2 2->3 3->1 3->3
#   1    1    2    1    1    3    1    2    0
#   2    0    1    1    1    1    0    2    7

If you need to fill in the possible but unobserved changes, you could do it like this:

all_states = unique(d$State)
all_changes = sort(outer(all_states, all_states, paste, sep = "->"))

d %>%
  group_by(ID) %>%
  mutate(change = factor(paste(lag(State), State, sep = "->"), levels = all_changes)) %>%
  slice(-1) %>%
  with(table(ID, change))
#    change
# ID  1->1 1->2 1->3 2->1 2->2 2->3 3->1 3->2 3->3
#   1    1    2    1    1    3    1    2    0    0
#   2    0    1    1    1    1    0    2    0    7
  • Related