Assume this is my df1 and I want to create df2.
So 0.67 shows the percent of x on Sat for A, and so on.
I am stuck on how to first group df1 by grp1 and then within grp1 group again by grp2 and then find n, % of each observation within this subgroups.
also, note that if there is no observation in the final subgroup it is assigned a value of 0.
I know I should provide my attempt before asking for some help with my problem, however, I am really stuck what is the approach for such this case, to begin with. Any help is appreciated.
df1 <- data.frame(grp1 = c(rep("A",4),rep("B",3), rep("C",4)),
obs = c("x", "x", "y", "z", "x","y","y", "x", "x","x", "y"),
grp2 = c("Sat", "Sat", "Sat", "Fri", "Sat", "Fri", "Fri", "Sat", "Sat", "Sat", "Fri"))
> df1
grp1 obs grp2
1 A x Sat
2 A x Sat
3 A y Sat
4 A z Fri
5 B x Sat
6 B y Fri
7 B y Fri
8 C x Sat
9 C x Sat
10 C x Sat
11 C y Fri
df2
grp1 obs grp2 n percent
1 A x Sat 2 0.67
2 A y Sat 1 0.33
3 A z Sat 0 0.00
4 A x Fri 0 0.00
5 A y Fri 0 0.00
6 A z Fri 1 1.00
7 B x Sat 1 1.00
8 B y Sat 0 0.00
9 B z Sat 0 0.00
10 B x Fri 0 0.00
11 B y Fri 2 1.00
12 B z Fri 0 0.00
13 C x Sat 3 1.00
14 C y Sat 0 0.00
15 C z Sat 0 0.00
16 C x Fri 0 0.00
17 C y Fri 1 1.00
18 C z Fri 0 0.00
CodePudding user response:
Perhaps this helps - get the frequency count
across
all the columns, expand the rows to fill with missing combinations, calculate the 'percent' by taking the proportions
on the column 'n' after grouping by the 'grp' columns
library(dplyr)
library(tidyr)
df1 %>%
count(across(everything())) %>%
complete(grp1, obs, grp2, fill = list(n = 0)) %>%
group_by(grp1, grp2) %>%
mutate(percent = proportions(n)) %>%
ungroup
-output
# A tibble: 18 × 5
grp1 obs grp2 n percent
<chr> <chr> <chr> <int> <dbl>
1 A x Fri 0 0
2 A x Sat 2 0.667
3 A y Fri 0 0
4 A y Sat 1 0.333
5 A z Fri 1 1
6 A z Sat 0 0
7 B x Fri 0 0
8 B x Sat 1 1
9 B y Fri 2 1
10 B y Sat 0 0
11 B z Fri 0 0
12 B z Sat 0 0
13 C x Fri 0 0
14 C x Sat 3 1
15 C y Fri 1 1
16 C y Sat 0 0
17 C z Fri 0 0
18 C z Sat 0 0