I am trying to count how many times a variable changes it's value, over different iterations.
For example, if I have some data that looks like this:
library(dplyr)
dfTest <- data.frame(
iteration = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2),
number = c(1,1,1,2,2,3,3,1,1,1,2,2,3,3),
value = c("AAA", "AB", "AAB", "BA", "BBA", "C", 'CA',
"AAA", "AB", "AAB", "BBA", "BBA", "CAA", "CAB")
)
> dfTest
iteration number value
1 1 1 AAA
2 1 1 AB
3 1 1 AAB
4 1 2 BA
5 1 2 BBA
6 1 3 C
7 1 3 CA
8 2 1 AAA
9 2 1 AB
10 2 1 AAB
11 2 2 BBA
12 2 2 BBA
13 2 3 CAA
14 2 3 CAB
We can see that the values for number
when it is equal to 1 don't change over the iterations. That is, when number
= 1, it's values are AAA, AB, AAB
for iteration
1 and 2.
However, we can see that when number
= 2, it's values do change over iterations. That is, BA, BBA
for iteration
1 and BBA, BBA
for iteration
2. A similar pattern exists for when number
= 3.
What Im trying to do is count the times the grouped number
variable changes over iterations. So, in my example,number = 1
, never changes, but number = 2
& number = 3
change in the 2nd iteration
.... so there is a 66% change in iteration
2.
For clarity, my desired output would look something like this:
iteration percentChange
1 1 1.00
2 2 0.66
For iteration
1, I am saying that (since everything is new), there is a 100% change... hence the value of 1.
I was attempting something like this:
dfTest %>%
group_by(number) %>%
dplyr::distinct(number, name, .keep_all = TRUE) %>%
group_by(iteration) %>%
summarize(change = n_distinct(number)/3)
but this doesn't work... Any suggestions as to how I could do this?
CodePudding user response:
library(tidyverse)
dfTest <- data.frame(
iteration = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2),
number = c(1,1,1,2,2,3,3,1,1,1,2,2,3,3),
value = c("AAA", "AB", "AAB", "BA", "BBA", "C", 'CA',
"AAA", "AB", "AAB", "BBA", "BBA", "CAA", "CAB"))
dfTest %>%
group_by(iteration, number) %>%
summarise(values = paste0(sort(unique(value)), collapse = ",")) %>%
group_by(number) %>%
mutate(changed = values != lag(values)) %>%
replace_na(list(changed = TRUE)) %>%
group_by(iteration) %>%
summarise(percent_change = mean(changed))
#> `summarise()` has grouped output by 'iteration'. You can override using the `.groups` argument.
#> # A tibble: 2 x 2
#> iteration percent_change
#> <dbl> <dbl>
#> 1 1 1
#> 2 2 0.667
Created on 2021-12-15 by the reprex package (v2.0.1)