Home > Software engineering >  How to count changes over groups in R?
How to count changes over groups in R?

Time:12-16

I am trying to count how many times a variable changes it's value, over different iterations.

For example, if I have some data that looks like this:

library(dplyr)
dfTest <- data.frame(
  iteration = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2),
  number = c(1,1,1,2,2,3,3,1,1,1,2,2,3,3),
  value = c("AAA", "AB", "AAB", "BA", "BBA", "C", 'CA',
           "AAA", "AB", "AAB", "BBA", "BBA", "CAA", "CAB")
)
> dfTest
   iteration number value
1          1      1  AAA
2          1      1   AB
3          1      1  AAB
4          1      2   BA
5          1      2  BBA
6          1      3    C
7          1      3   CA
8          2      1  AAA
9          2      1   AB
10         2      1  AAB
11         2      2  BBA
12         2      2  BBA
13         2      3  CAA
14         2      3  CAB

We can see that the values for number when it is equal to 1 don't change over the iterations. That is, when number = 1, it's values are AAA, AB, AAB for iteration 1 and 2.

However, we can see that when number = 2, it's values do change over iterations. That is, BA, BBA for iteration 1 and BBA, BBA for iteration 2. A similar pattern exists for when number = 3.

What Im trying to do is count the times the grouped number variable changes over iterations. So, in my example,number = 1, never changes, but number = 2 & number = 3 change in the 2nd iteration.... so there is a 66% change in iteration 2.

For clarity, my desired output would look something like this:

 iteration percentChange
1         1          1.00
2         2          0.66

For iteration 1, I am saying that (since everything is new), there is a 100% change... hence the value of 1.

I was attempting something like this:

dfTest %>% 
  group_by(number) %>% 
  dplyr::distinct(number, name, .keep_all = TRUE) %>% 
  group_by(iteration) %>% 
  summarize(change = n_distinct(number)/3)

but this doesn't work... Any suggestions as to how I could do this?

CodePudding user response:

library(tidyverse)

dfTest <- data.frame(
  iteration = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2),
  number = c(1,1,1,2,2,3,3,1,1,1,2,2,3,3),
  value = c("AAA", "AB", "AAB", "BA", "BBA", "C", 'CA',
            "AAA", "AB", "AAB", "BBA", "BBA", "CAA", "CAB")) 

dfTest %>%
  group_by(iteration, number) %>%
  summarise(values = paste0(sort(unique(value)), collapse = ",")) %>% 
  group_by(number) %>% 
  mutate(changed = values != lag(values)) %>% 
  replace_na(list(changed = TRUE)) %>% 
  group_by(iteration) %>% 
  summarise(percent_change = mean(changed))
#> `summarise()` has grouped output by 'iteration'. You can override using the `.groups` argument.
#> # A tibble: 2 x 2
#>   iteration percent_change
#>       <dbl>          <dbl>
#> 1         1          1    
#> 2         2          0.667

Created on 2021-12-15 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related