Counts of factor levels for multiple variables grouped by row-CodePudding

I want to count the number of occurrences that a specific factor level occurs across multiple factor varaibles per row.

Simplified, I want to know how many times each factor level is chosen across specific variables per row (memberID).

Example data:

results=data.frame(MemID=c('A','B','C','D','E','F','G','H'),
             value_a = c(1,2,1,4,5,1,4,0),
             value_b = c(1,5,2,3,4,1,0,3),
             value_c = c(3,5,2,1,1,1,2,1)
             )

In this example, I want to know the frequency of each factor level for value_a and value_b for each MemID. How many times does A respond 1? How many times does A respond 2? Etc...for each level and for each MemID but only for value_a and value_b.

I would like the output to look something like this:

counts_by_level = data.frame(MemID=c('A','B','C','D','E','F','G','H'), 
             count_1 = c(2, 0, 1, 0, 0, 2, 0, 0),
             count_2 = c(0, 1, 1, 0, 0, 0, 0, 0),
             count_3 = c(0, 0, 0, 1, 0, 0, 0, 1),
             count_4 = c(0, 0, 0, 1, 1, 0, 1, 0),
             count_5 = c(0, 1, 0, 0, 1, 0, 0, 0))

I have been trying to use add_count or add_tally as well as table and searching other ways to answer this question. However, I am struggling to identify specific factor levels across multiple variables and then output new columns for the counts of those levels for each row.

CodePudding user response：

You could do something like this. Note that you didn't include a zero count, but there are some zero selections.

library(tidyverse)

results |>
  select(-value_c) |>
  pivot_longer(cols = starts_with("value"), 
               names_pattern =   "(value)") |>
  mutate(count = 1) |>
  select(-name) |>
  pivot_wider(names_from = value, 
              values_from = count,
              names_prefix = "count_",
              values_fill = 0,
              values_fn = sum)
#> # A tibble: 8 x 7
#>   MemID count_1 count_2 count_5 count_4 count_3 count_0
#>   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 A           2       0       0       0       0       0
#> 2 B           0       1       1       0       0       0
#> 3 C           1       1       0       0       0       0
#> 4 D           0       0       0       1       1       0
#> 5 E           0       0       1       1       0       0
#> 6 F           2       0       0       0       0       0
#> 7 G           0       0       0       1       0       1
#> 8 H           0       0       0       0       1       1

CodePudding user response：

Another solution:

results %>% 
  group_by(MemID, value_a, value_b) %>% 
  summarise(n=n()) %>% 
  pivot_longer(c(value_a,value_b)) %>% 
  group_by(MemID, value) %>% 
  summarise(n=sum(n)) %>% 
  pivot_wider(MemID, 
              names_from = value, names_sort = T, names_prefix = "count_",
              values_from=n, values_fn=sum, values_fill = 0)