Home > Mobile >  Counts of factor levels for multiple variables grouped by row
Counts of factor levels for multiple variables grouped by row

Time:10-29

I want to count the number of occurrences that a specific factor level occurs across multiple factor varaibles per row.

Simplified, I want to know how many times each factor level is chosen across specific variables per row (memberID).

Example data:

results=data.frame(MemID=c('A','B','C','D','E','F','G','H'),
             value_a = c(1,2,1,4,5,1,4,0),
             value_b = c(1,5,2,3,4,1,0,3),
             value_c = c(3,5,2,1,1,1,2,1)
             )

In this example, I want to know the frequency of each factor level for value_a and value_b for each MemID. How many times does A respond 1? How many times does A respond 2? Etc...for each level and for each MemID but only for value_a and value_b.

I would like the output to look something like this:

counts_by_level = data.frame(MemID=c('A','B','C','D','E','F','G','H'), 
             count_1 = c(2, 0, 1, 0, 0, 2, 0, 0),
             count_2 = c(0, 1, 1, 0, 0, 0, 0, 0),
             count_3 = c(0, 0, 0, 1, 0, 0, 0, 1),
             count_4 = c(0, 0, 0, 1, 1, 0, 1, 0),
             count_5 = c(0, 1, 0, 0, 1, 0, 0, 0))

I have been trying to use add_count or add_tally as well as table and searching other ways to answer this question. However, I am struggling to identify specific factor levels across multiple variables and then output new columns for the counts of those levels for each row.

CodePudding user response:

You could do something like this. Note that you didn't include a zero count, but there are some zero selections.

library(tidyverse)

results |>
  select(-value_c) |>
  pivot_longer(cols = starts_with("value"), 
               names_pattern =   "(value)") |>
  mutate(count = 1) |>
  select(-name) |>
  pivot_wider(names_from = value, 
              values_from = count,
              names_prefix = "count_",
              values_fill = 0,
              values_fn = sum)
#> # A tibble: 8 x 7
#>   MemID count_1 count_2 count_5 count_4 count_3 count_0
#>   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 A           2       0       0       0       0       0
#> 2 B           0       1       1       0       0       0
#> 3 C           1       1       0       0       0       0
#> 4 D           0       0       0       1       1       0
#> 5 E           0       0       1       1       0       0
#> 6 F           2       0       0       0       0       0
#> 7 G           0       0       0       1       0       1
#> 8 H           0       0       0       0       1       1

CodePudding user response:

Another solution:

results %>% 
  group_by(MemID, value_a, value_b) %>% 
  summarise(n=n()) %>% 
  pivot_longer(c(value_a,value_b)) %>% 
  group_by(MemID, value) %>% 
  summarise(n=sum(n)) %>% 
  pivot_wider(MemID, 
              names_from = value, names_sort = T, names_prefix = "count_",
              values_from=n, values_fn=sum, values_fill = 0) 
  • Related