what's the easiest way to calculate row-wise sums? For example if I wanted to calculate the sum of all variables with "txt_"? (see example below)
df <- data.frame(var1 = c(1, 2, 3),
txt_1 = c(1, 1, 0),
txt_2 = c(1, 0, 0),
txt_3 = c(1, 0, 0))
CodePudding user response:
Another dplyr option:
df %>%
rowwise() %>%
mutate(sum = sum(c_across(starts_with("txt"))))
CodePudding user response:
base R
We can first use grepl
to find the column names that start with txt_
, then use rowSums
on the subset.
rowSums(df[, grepl("txt_", names(df))])
[1] 3 1 0
If you want to bind it back to the original dataframe, then we can bind the output to the original dataframe.
cbind(df, sums = rowSums(df[, grepl("txt_", names(df))]))
var1 txt_1 txt_2 txt_3 sums
1 1 1 1 1 3
2 2 1 0 0 1
3 3 0 0 0 0
Tidyverse
library(tidyverse)
df %>%
mutate(sum = rowSums(across(starts_with("txt_"))))
var1 txt_1 txt_2 txt_3 sum
1 1 1 1 1 3
2 2 1 0 0 1
3 3 0 0 0 0
Or if you want just the vector, then we can use pull
:
df %>%
mutate(sum = rowSums(across(starts_with("txt_")))) %>%
pull(sum)
[1] 3 1 0
Data Table
Here is a data.table
option as well:
library(data.table)
dt <- as.data.table(df)
dt[ ,sum := rowSums(.SD), .SDcols = grep("txt_", names(dt))]
dt[["sum"]]
# [1] 3 1 0