Home > database >  Calculate row sums by variable names
Calculate row sums by variable names

Time:02-17

what's the easiest way to calculate row-wise sums? For example if I wanted to calculate the sum of all variables with "txt_"? (see example below)

df <- data.frame(var1 = c(1, 2, 3),
                 txt_1 = c(1, 1, 0),
                 txt_2 = c(1, 0, 0),
                 txt_3 = c(1, 0, 0))

CodePudding user response:

Another dplyr option:

df %>% 
  rowwise() %>%
  mutate(sum = sum(c_across(starts_with("txt"))))

CodePudding user response:

base R

We can first use grepl to find the column names that start with txt_, then use rowSums on the subset.

rowSums(df[, grepl("txt_", names(df))])

[1] 3 1 0

If you want to bind it back to the original dataframe, then we can bind the output to the original dataframe.

cbind(df, sums = rowSums(df[, grepl("txt_", names(df))]))

  var1 txt_1 txt_2 txt_3 sums
1    1     1     1     1    3
2    2     1     0     0    1
3    3     0     0     0    0

Tidyverse

library(tidyverse)

df %>% 
  mutate(sum = rowSums(across(starts_with("txt_"))))

  var1 txt_1 txt_2 txt_3 sum
1    1     1     1     1   3
2    2     1     0     0   1
3    3     0     0     0   0

Or if you want just the vector, then we can use pull:

df %>% 
  mutate(sum = rowSums(across(starts_with("txt_")))) %>% 
  pull(sum)

[1] 3 1 0

Data Table

Here is a data.table option as well:

library(data.table)
dt <- as.data.table(df)

dt[ ,sum := rowSums(.SD), .SDcols = grep("txt_", names(dt))]

dt[["sum"]]
# [1] 3 1 0
  • Related