How to count number of values in columns based on a category in R?-CodePudding

Suppose we have a data frame df like

book_id book_category book_word_hi book_word_bye book_word_yes
1       drama         3            0             4
2       action        1            4             5
3       drama         5            3             2

I would like to count the number of values within the book_word columns and sum them in a table for each book_category.

So the output here should look something like:

drama: 17 
action: 10

Does anyone know how to do this?

CodePudding user response：

This is a short and simple one-liner in base R, without requiring any additional packages.

tapply(rowSums(df[3:5]), df[2], sum)
#> book_category
#> action  drama 
#>     10     17

CodePudding user response：

Use the function summarise_at link to docs:

df %>%
  summarise_at(c("book_word_hi","book_word_bye","book_word_yes"), sum, na.rm = FALSE)

Could also be combined with a group_by

CodePudding user response：

Here, there can be also other columns like book_word_foo as well, which will be counted:

library(tidyverse)

data <- tribble(
  ~book_id, ~book_category, ~book_word_hi, ~book_word_bye, ~book_word_yes,
  1, "drama", 3, 0, 4,
  2, "action", 1, 4, 5,
  2, "drama", 5, 3, 2,
)

data %>%
  pivot_longer(-c(book_id, book_category)) %>%
  group_by(book_category) %>%
  summarise(n = sum(value))
#> # A tibble: 2 × 2
#>   book_category     n
#>   <chr>         <dbl>
#> 1 action           10
#> 2 drama            17

^{Created on 2022-05-05 by the reprex package (v2.0.0)}

CodePudding user response：

First sum up everything columns that starts_with the string "book_word". Then group_by and sum up the values per book_category.

library(dplyr)

df %>% 
  mutate(book_sum = rowSums(across(starts_with("book_word")))) %>% 
  group_by(book_category) %>% 
  summarize(sum = sum(book_sum))

# A tibble: 2 × 2
  book_category   sum
  <chr>         <int>
1 action           10
2 drama            17