I have a dataset that has two columns. One column indicates the group and each group has only two rows. The second column represents the category. Now I would like to count the percentage of each group not having the same category. So in row 1 and 2, the Category is not the same while in row 3 and 4 it is the same. In the provided data, I would get a percentage of 66.66% as four times the Category changes while it stays the same for two groups.
This is my data:
structure(list(Group = c("A", "A", "B", "B", "C", "C", "D", "D",
"E", "E", "F", "F"), Category = c(1L, 2L, 3L, 3L, 5L, 6L, 7L,
7L, 7L, 6L, 5L, 4L)), class = "data.frame", row.names = c(NA,
-12L))
I have tried the following so far:
Data <- Data %>%
group_by(Group) %>%
count(n())
But I don't now how to write the code in the last line to get my desired percentage. Could someone help me here?
CodePudding user response:
A base
solution with tapply()
:
mean(with(df, tapply(Category, Group, \(x) length(unique(x)))) > 1)
# [1] 0.6666667
With dplyr
, you could use n_distinct()
to count the number of unique values.
library(dplyr)
df %>%
group_by(Group) %>%
summarise(N = n_distinct(Category)) %>%
summarise(Percent = mean(N > 1))
# # A tibble: 1 × 1
# Percent
# <dbl>
# 1 0.667
CodePudding user response:
To show it for both classes, you can use the following code:
library(dplyr)
Data %>%
group_by(Group) %>%
mutate(unique = as.numeric(n_distinct(Category) == 1)) %>%
ungroup() %>%
summarise(Percent = prop.table(table(unique)))
Output:
# A tibble: 2 × 1
Percent
<table>
1 0.6666667
2 0.3333333
CodePudding user response:
Using base R
counts <- table(df)
prop.table(table(rowSums(counts != 0)))
-output
1 2
0.3333333 0.6666667