Home > Mobile >  Summing across in a dataframe with condition coming from another column
Summing across in a dataframe with condition coming from another column

Time:05-14

this is not a very good title for the question. I want to sum across certain columns in a data frame for each group, excluding one column for each of my groups. A simple example would be as follows:

df <- tibble(group_name = c("A", "B","C"), mean_A = c(1,2,3), mean_B = c(2,3,4), mean_C=c(3,4,5))
df %>% group_by(group_name) %>% mutate(m1 = sum(across(contains("mean"))))

This creates column m1, which is the sum across mean_a, mean_b, mean_c for each group. What I want to do is exclude mean_a for group a, mean_b for b and mean_c for c. The following does not work though (not surprisingly).

df %>% group_by(group_name) %>% mutate(m1 = sum(across(c(contains("mean") & !contains(group_name)))))

Do you have an idea how I could do this? My original data contains many more groups, so would be hard to do by hand.

Edit: I have tried the following way which solves it in a rudimentary fashion, but something (?grepl maybe) seems to not work great here and I get the wrong result.

df %>% pivot_longer(!group_name) %>% mutate(value2 = case_when(grepl(group_name, name) ~ 0, TRUE ~ value)) %>% group_by(group_name) %>% summarise(m1 = sum(value2))

Edit2: Found out what's wrong with the above, and below works, but still a lot of warnings so I recommend people to follow TarJae's response below

df %>% pivot_longer(!group_name) %>% group_by(group_name) %>% mutate(value2 = case_when(grepl(group_name, name) ~ 0, TRUE ~ value)) %>% group_by(group_name) %>% summarise(m1 = sum(value2))

CodePudding user response:

Here is one way how we could do it:

  1. We create a helper column to match column names
  2. We set value of mean column to zeor if column names matches helper name.
  3. Then we use transmute with select to calculate rowSums
  4. Finally we cbind column m1 to df:
library(dplyr)

df %>% 
  mutate(helper = paste0("mean_", group_name)) %>% 
  mutate(across(starts_with("mean"), ~ifelse(cur_column()==helper, 0, .))) %>%
  transmute(m1 = select(., contains("mean")) %>% 
           rowSums()) %>% 
  cbind(df)
  m1 group_name mean_a mean_b mean_c
1  5          a      1      2      3
2  6          b      2      3      4
3  7          c      3      4      5
  • Related