I have the following dataframe called df (dput
below):
group value
1 A 4
2 A 2
3 A 4
4 A 3
5 A 1
6 A 5
7 B 3
8 B 2
9 B 1
10 B 2
11 B 2
12 B 2
I would like to calculate the percentage of values on the mode value per group. Here is the code to calculate the mode per group:
# Mode function
mode <- function(codes){
which.max(tabulate(codes))
}
library(dplyr)
# Calculate mode per group
df %>%
group_by(group) %>%
mutate(mode_value = mode(value))
#> # A tibble: 12 × 3
#> # Groups: group [2]
#> group value mode_value
#> <chr> <dbl> <int>
#> 1 A 4 4
#> 2 A 2 4
#> 3 A 4 4
#> 4 A 3 4
#> 5 A 1 4
#> 6 A 5 4
#> 7 B 3 2
#> 8 B 2 2
#> 9 B 1 2
#> 10 B 2 2
#> 11 B 2 2
#> 12 B 2 2
Created on 2022-11-28 with reprex v2.0.2
But I am not sure how to calculate the percentage of values on the mode per group which should look like this:
group value mode_value perc_on_mode
1 A 4 4 0.33
2 A 2 4 0.33
3 A 4 4 0.33
4 A 3 4 0.33
5 A 1 4 0.33
6 A 5 4 0.33
7 B 3 2 0.67
8 B 2 2 0.67
9 B 1 2 0.67
10 B 2 2 0.67
11 B 2 2 0.67
12 B 2 2 0.67
So I was wondering if anyone knows how to calculate the percentage of values on the mode value per group?
dput
of df:
df <- structure(list(group = c("A", "A", "A", "A", "A", "A", "B", "B",
"B", "B", "B", "B"), value = c(4, 2, 4, 3, 1, 5, 3, 2, 1, 2,
2, 2)), class = "data.frame", row.names = c(NA, -12L))
CodePudding user response:
You could try:
df %>%
group_by(group) %>%
mutate(mode_value = mode(value),
perc_on_mode = mean(value == mode_value))
Output:
# A tibble: 12 x 4
# Groups: group [2]
group value mode_value perc_on_mode
<chr> <dbl> <int> <dbl>
1 A 4 4 0.333
2 A 2 4 0.333
3 A 4 4 0.333
4 A 3 4 0.333
5 A 1 4 0.333
6 A 5 4 0.333
7 B 3 2 0.667
8 B 2 2 0.667
9 B 1 2 0.667
10 B 2 2 0.667
11 B 2 2 0.667
12 B 2 2 0.667
CodePudding user response:
By modifying the mode
function:
mode <- function(codes){
tab <- tabulate(codes)
mode_value <- which.max(tab)
data.frame(value = codes, mode_value, perc_on_mode = tab[mode_value]/length(codes))
}
# Calculate mode per group
df %>%
group_by(group) %>%
do(mode(.$value))
#> # A tibble: 12 x 4
#> # Groups: group [2]
#> group value mode_value perc_on_mode
#> <chr> <dbl> <int> <dbl>
#> 1 A 4 4 0.333
#> 2 A 2 4 0.333
#> 3 A 4 4 0.333
#> 4 A 3 4 0.333
#> 5 A 1 4 0.333
#> 6 A 5 4 0.333
#> 7 B 3 2 0.667
#> 8 B 2 2 0.667
#> 9 B 1 2 0.667
#> 10 B 2 2 0.667
#> 11 B 2 2 0.667
#> 12 B 2 2 0.667
Or with data.table
:
library(data.table)
mode <- function(codes){
tab <- tabulate(codes)
mode_value <- which.max(tab)
list(mode_value, tab[mode_value]/length(codes))
}
setDT(df)[, c("mode_value", "perc_on_mode") := mode(value), group][]
#> group value mode_value perc_on_mode
#> 1: A 4 4 0.3333333
#> 2: A 2 4 0.3333333
#> 3: A 4 4 0.3333333
#> 4: A 3 4 0.3333333
#> 5: A 1 4 0.3333333
#> 6: A 5 4 0.3333333
#> 7: B 3 2 0.6666667
#> 8: B 2 2 0.6666667
#> 9: B 1 2 0.6666667
#> 10: B 2 2 0.6666667
#> 11: B 2 2 0.6666667
#> 12: B 2 2 0.6666667