Home > Enterprise >  Count the number of times a value changes in a variable for specific values
Count the number of times a value changes in a variable for specific values

Time:02-20

I have a dataframe with multiple rows representing repeated measures of the same individuals and then I am having a variable that represents some sequences coded from 1-5.

I want to find out how many times (number of events) across all individuals number 1, 2 and 3 appear after 4 and 5 in the P1 variable.

As a final product I want to have a dataframe where it will have as variables the category 4 and 5 and as rows the categories 1, 2 and 3 and it will show how many events category 4 had until it gets interrupted by either 1, 2 and 3 category and how many events category 5 had until it gets interrupted by either 1, 2 or 3 category.

Here is a small subset of my dataset

df <- data.frame(ID = as.factor(c(1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,5,5,5,5,6,6,6,6,6,6)),
             P1 = c(1,5,2,4,1,5,2,4,2,5,1,4,2,4,1,5,3,5,1,4,2,5,1,4,1))

What I want is either a dataframe or some sort of results like the following meaning that after number 4 there were 3 events of number "1", 3 events of number "2" and 0 events of number "3".

   Category_4    Category_5
1       3              2
2       3              2
3       0              1

Is there a way to do it with dplyr?

CodePudding user response:

If we are looking for the adjacent element that matches, then get the lag and do a count, and reshape to 'wide' with pivot_wider

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
   mutate(P2 = lag(P1)) %>%
  filter(P1 %in% c(4, 5)) %>%
  count(P1 = str_c("Category_", P1), P2) %>% 
  pivot_wider(names_from = P1, values_from = n, values_fill = 0)

-output

# A tibble: 3 × 3
     P2 Category_4 Category_5
  <dbl>      <int>      <int>
1     1          3          3
2     2          3          2
3     3          0          1
  • Related