Count percentage of observations that switch value-CodePudding

I have a dataset that has two columns. One column indicates the group and each group has only two rows. The second column represents the category. Now I would like to count the percentage of each group not having the same category. So in row 1 and 2, the Category is not the same while in row 3 and 4 it is the same. In the provided data, I would get a percentage of 66.66% as four times the Category changes while it stays the same for two groups.

This is my data:

structure(list(Group = c("A", "A", "B", "B", "C", "C", "D", "D", 
"E", "E", "F", "F"), Category = c(1L, 2L, 3L, 3L, 5L, 6L, 7L, 
7L, 7L, 6L, 5L, 4L)), class = "data.frame", row.names = c(NA, 
-12L))

I have tried the following so far:

Data <- Data %>%
  group_by(Group) %>%
  count(n())

But I don't now how to write the code in the last line to get my desired percentage. Could someone help me here?

CodePudding user response：

A base solution with tapply():

mean(with(df, tapply(Category, Group, \(x) length(unique(x)))) > 1)

# [1] 0.6666667

With dplyr, you could use n_distinct() to count the number of unique values.

library(dplyr)

df %>%
  group_by(Group) %>%
  summarise(N = n_distinct(Category)) %>%
  summarise(Percent = mean(N > 1))

# # A tibble: 1 × 1
#   Percent
#     <dbl>
# 1   0.667

CodePudding user response：

To show it for both classes, you can use the following code:

library(dplyr)
Data %>%
  group_by(Group) %>%
  mutate(unique = as.numeric(n_distinct(Category) == 1)) %>%
  ungroup() %>%
  summarise(Percent = prop.table(table(unique)))

Output:

# A tibble: 2 × 1
  Percent  
  <table>  
1 0.6666667
2 0.3333333

CodePudding user response：

Using base R

counts <- table(df)
prop.table(table(rowSums(counts != 0)))

-output

        1         2 
0.3333333 0.6666667