I want to count the number of occurrences that a specific factor level occurs across multiple factor varaibles per row.
Simplified, I want to know how many times each factor level is chosen across specific variables per row (memberID).
Example data:
results=data.frame(MemID=c('A','B','C','D','E','F','G','H'),
value_a = c(1,2,1,4,5,1,4,0),
value_b = c(1,5,2,3,4,1,0,3),
value_c = c(3,5,2,1,1,1,2,1)
)
In this example, I want to know the frequency of each factor level for value_a and value_b for each MemID. How many times does A respond 1? How many times does A respond 2? Etc...for each level and for each MemID but only for value_a and value_b.
I would like the output to look something like this:
counts_by_level = data.frame(MemID=c('A','B','C','D','E','F','G','H'),
count_1 = c(2, 0, 1, 0, 0, 2, 0, 0),
count_2 = c(0, 1, 1, 0, 0, 0, 0, 0),
count_3 = c(0, 0, 0, 1, 0, 0, 0, 1),
count_4 = c(0, 0, 0, 1, 1, 0, 1, 0),
count_5 = c(0, 1, 0, 0, 1, 0, 0, 0))
I have been trying to use add_count or add_tally as well as table and searching other ways to answer this question. However, I am struggling to identify specific factor levels across multiple variables and then output new columns for the counts of those levels for each row.
CodePudding user response:
You could do something like this. Note that you didn't include a zero count, but there are some zero selections.
library(tidyverse)
results |>
select(-value_c) |>
pivot_longer(cols = starts_with("value"),
names_pattern = "(value)") |>
mutate(count = 1) |>
select(-name) |>
pivot_wider(names_from = value,
values_from = count,
names_prefix = "count_",
values_fill = 0,
values_fn = sum)
#> # A tibble: 8 x 7
#> MemID count_1 count_2 count_5 count_4 count_3 count_0
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A 2 0 0 0 0 0
#> 2 B 0 1 1 0 0 0
#> 3 C 1 1 0 0 0 0
#> 4 D 0 0 0 1 1 0
#> 5 E 0 0 1 1 0 0
#> 6 F 2 0 0 0 0 0
#> 7 G 0 0 0 1 0 1
#> 8 H 0 0 0 0 1 1
CodePudding user response:
Another solution:
results %>%
group_by(MemID, value_a, value_b) %>%
summarise(n=n()) %>%
pivot_longer(c(value_a,value_b)) %>%
group_by(MemID, value) %>%
summarise(n=sum(n)) %>%
pivot_wider(MemID,
names_from = value, names_sort = T, names_prefix = "count_",
values_from=n, values_fn=sum, values_fill = 0)