Suppose we have a data frame df
like
book_id book_category book_word_hi book_word_bye book_word_yes
1 drama 3 0 4
2 action 1 4 5
3 drama 5 3 2
I would like to count the number of values within the book_word
columns and sum them in a table for each book_category
.
So the output here should look something like:
drama: 17
action: 10
Does anyone know how to do this?
CodePudding user response:
This is a short and simple one-liner in base R, without requiring any additional packages.
tapply(rowSums(df[3:5]), df[2], sum)
#> book_category
#> action drama
#> 10 17
CodePudding user response:
Use the function summarise_at
link to docs:
df %>%
summarise_at(c("book_word_hi","book_word_bye","book_word_yes"), sum, na.rm = FALSE)
Could also be combined with a group_by
CodePudding user response:
Here, there can be also other columns like book_word_foo
as well, which will be counted:
library(tidyverse)
data <- tribble(
~book_id, ~book_category, ~book_word_hi, ~book_word_bye, ~book_word_yes,
1, "drama", 3, 0, 4,
2, "action", 1, 4, 5,
2, "drama", 5, 3, 2,
)
data %>%
pivot_longer(-c(book_id, book_category)) %>%
group_by(book_category) %>%
summarise(n = sum(value))
#> # A tibble: 2 × 2
#> book_category n
#> <chr> <dbl>
#> 1 action 10
#> 2 drama 17
Created on 2022-05-05 by the reprex package (v2.0.0)
CodePudding user response:
First sum up everything columns that starts_with
the string "book_word". Then group_by
and sum
up the values per book_category
.
library(dplyr)
df %>%
mutate(book_sum = rowSums(across(starts_with("book_word")))) %>%
group_by(book_category) %>%
summarize(sum = sum(book_sum))
# A tibble: 2 × 2
book_category sum
<chr> <int>
1 action 10
2 drama 17