Home > database >  count number of multiple observations within subgroups of each group in R
count number of multiple observations within subgroups of each group in R

Time:07-19

Assume this is my df1 and I want to create df2.
So 0.67 shows the percent of x on Sat for A, and so on.

I am stuck on how to first group df1 by grp1 and then within grp1 group again by grp2 and then find n, % of each observation within this subgroups.
also, note that if there is no observation in the final subgroup it is assigned a value of 0.
I know I should provide my attempt before asking for some help with my problem, however, I am really stuck what is the approach for such this case, to begin with. Any help is appreciated.

df1 <- data.frame(grp1 = c(rep("A",4),rep("B",3), rep("C",4)),
                 obs = c("x", "x", "y", "z",     "x","y","y",    "x", "x","x", "y"),
                 grp2 = c("Sat", "Sat", "Sat", "Fri", "Sat", "Fri", "Fri", "Sat", "Sat", "Sat", "Fri"))

> df1
   grp1 obs grp2
1     A   x  Sat
2     A   x  Sat
3     A   y  Sat
4     A   z  Fri
5     B   x  Sat
6     B   y  Fri
7     B   y  Fri
8     C   x  Sat
9     C   x  Sat
10    C   x  Sat
11    C   y  Fri




   df2
   grp1 obs grp2 n percent
1     A   x  Sat 2    0.67
2     A   y  Sat 1    0.33
3     A   z  Sat 0    0.00
4     A   x  Fri 0    0.00
5     A   y  Fri 0    0.00
6     A   z  Fri 1    1.00
7     B   x  Sat 1    1.00
8     B   y  Sat 0    0.00
9     B   z  Sat 0    0.00
10    B   x  Fri 0    0.00
11    B   y  Fri 2    1.00
12    B   z  Fri 0    0.00
13    C   x  Sat 3    1.00
14    C   y  Sat 0    0.00
15    C   z  Sat 0    0.00
16    C   x  Fri 0    0.00
17    C   y  Fri 1    1.00
18    C   z  Fri 0    0.00

CodePudding user response:

Perhaps this helps - get the frequency count across all the columns, expand the rows to fill with missing combinations, calculate the 'percent' by taking the proportions on the column 'n' after grouping by the 'grp' columns

library(dplyr)
library(tidyr)
df1 %>%
   count(across(everything())) %>% 
  complete(grp1, obs, grp2, fill = list(n = 0)) %>% 
  group_by(grp1, grp2) %>%
  mutate(percent = proportions(n)) %>%
  ungroup

-output

# A tibble: 18 × 5
   grp1  obs   grp2      n percent
   <chr> <chr> <chr> <int>   <dbl>
 1 A     x     Fri       0   0    
 2 A     x     Sat       2   0.667
 3 A     y     Fri       0   0    
 4 A     y     Sat       1   0.333
 5 A     z     Fri       1   1    
 6 A     z     Sat       0   0    
 7 B     x     Fri       0   0    
 8 B     x     Sat       1   1    
 9 B     y     Fri       2   1    
10 B     y     Sat       0   0    
11 B     z     Fri       0   0    
12 B     z     Sat       0   0    
13 C     x     Fri       0   0    
14 C     x     Sat       3   1    
15 C     y     Fri       1   1    
16 C     y     Sat       0   0    
17 C     z     Fri       0   0    
18 C     z     Sat       0   0    
  • Related