Home > Enterprise >  R: Sum of all the columns that start with
R: Sum of all the columns that start with

Time:05-23

I want to make a new column that is the sum of all the columns that start with "m_" and a new column that is the sum of all the columns that start with "w_". Unfortunately it is not every nth column, so indexing all the odd and even columns won't work.

columnnames <- c("m_16", "w_16", "w_17", "m_17", "w_18", "m_18")
values1 <- c(3, 4, 8, 1, 12, 4)
values2 <- c(8, 0, 12, 1, 3, 2)
df <- as.data.frame(rbind(values1, values2))
names(df) <- columnnames

What I want to get is:

columnnames <- c("m_16", "w_16", "w_17", "m_17", "w_18", "m_18", "sum_m", "sum_w")
values1 <- c(3, 4, 8, 1, 12, 4, 8, 24)
values2 <- c(8, 0, 12, 1, 3, 2, 11, 15)
df <- as.data.frame(rbind(values1, values2))

names(df) <- columnnames

So far during my search, I only found how to sum specific columns on conditions but I don't want to specify the columns because there's a lot of them.

Thank you for your help in advance!

CodePudding user response:

Another possible solution:

library(dplyr)

df %>% 
  mutate(sum_m = across(starts_with("m")) %>% rowSums) %>% 
  mutate(sum_w = across(starts_with("w")) %>% rowSums)

#>         m_16 w_16 w_17 m_17 w_18 m_18 sum_m sum_w
#> values1    3    4    8    1   12    4     8    24
#> values2    8    0   12    1    3    2    11    15

CodePudding user response:

dplyr has a quick answer:

library(dplyr)
df <- df %>% 
    mutate(
        m_col_sum = select(., starts_with("m")) %>% rowSums(),
        w_col_sum = select(., starts_with("w")) %>% rowSums()
    )

You might need to specify na.rm = TRUE as an additional argument to rowSums().

CodePudding user response:

# Vector containing the letters which target vectors' 
# names start with: names_start_with => character vector
names_start_with <- c("m", "w")

# Compute row-sums, column-bind vectors to data.frame: res => data.frame
res <- cbind(
  df,
  vapply(
    names_start_with, 
    function(x){
      rowSums(df[,startsWith(names(df), x)])
    }, 
    numeric(length(names_start_with))
  ),
  row.names = NULL
)

# Output data.frame to console: data.frame => stdout(console)
res

CodePudding user response:

A base solution using rowSums inside lapply.

cbind(df, lapply(c(sum_m = "m", sum_w = "w"),
                 \(x) rowSums(df[startsWith(names(df), x)])))
#        m_16 w_16 w_17 m_17 w_18 m_18 sum_m sum_w
#values1    3    4    8    1   12    4     8    24
#values2    8    0   12    1    3    2    11    15

Or in case there are not so many groups simply:

df$sum_m <- rowSums(df[startsWith(names(df), "m")])
df$sum_w <- rowSums(df[startsWith(names(df), "w")])
df
#        m_16 w_16 w_17 m_17 w_18 m_18 sum_m sum_w
#values1    3    4    8    1   12    4     8    24
#values2    8    0   12    1    3    2    11    15
  • Related