How can I save grouped counts across factor levels into a new variable with dplyr?-CodePudding

i try to save grouped counts of various factor levels into a new variable:

Lets say my data look like this:


a <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
b <- c("acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con","acc", "rej", "con", "acc", "rej")

df <- data.frame(a,b)

The resulting data frame should look like this:


a <- c(1,2,3,4)
number_acc <- c(2,2,1,2)
number_rej <- c(2,1,2,2)
number_con <- c(1,2,2,1)

I tried to solve the problem in the following way:


df2 <- df %>%
  group_by(a) %>% 
  mutate(number_acc = count(b == 'acc'), 
         number_rej = count(b == 'rej'),
         number_con = count(b == 'con'))

However, i get an error message that the method "count" cannot be applied to objects of the class "logical".

Thank you for your help!

CodePudding user response：

Use the tabyl function from the janitor package:

Your data:

a <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
b <- c("acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con","acc", "rej", "con", "acc", "rej")

df <- data.frame(a,b)

Summarize grouping by count:

library(janitor)
data_summary <- tabyl(df, a, b)
data_summary

# a acc con rej
# 1   2   1   2
# 2   2   2   1
# 3   1   2   2
# 4   2   1   2

CodePudding user response：

To make the existing code to work, we need to summarise instead of mutate, and sum instead of count:

df %>%
  group_by(a) %>% 
  summarise(number_acc = sum(b == 'acc'), 
            number_rej = sum(b == 'rej'),
            number_con = sum(b == 'con'))

# # A tibble: 4 x 4
#       a number_acc number_rej number_con
#   <dbl>      <int>      <int>      <int>
# 1     1          2          2          1
# 2     2          2          1          2
# 3     3          1          2          2
# 4     4          2          2          1

But there are better ways of doing this, for example see answers at:

CodePudding user response：

Here is an alternative way: We could use pivot_wider with names_glue after count:

library(tidyr)
library(dplyr)

df %>% 
  count(a,b) %>% 
  pivot_wider(
    names_from = b,
    values_from = n,
    names_glue = "{b}_{'number'}"
  )

      a acc_number con_number rej_number
  <dbl>      <int>      <int>      <int>
1     1          2          1          2
2     2          2          2          1
3     3          1          2          2
4     4          2          1          2