In R, what's the easiest way to sum up two groups across all numeric variables and create a new-CodePudding

I have a dataframe of variables, and though I want to keep all the raw data, I also want to make a new row that's the summation of all that start with "Justin -" called "Justin - All". I can think of ways of doing this that take many steps, but I'm wondering if there's some parsimonous way to make this transition.

Here's the raw data:

library(dplyr)
test <- tibble(name = c("Justin - Pre", "Justin - Post", "Sibley", "Corey"),
               sex = c("Male", "Male", "Female", "Male"),
               test = c(1, 2, 5, 10),
               exam = c(5, 5, 100, 101))

outcome <- tibble(name = c("Justin - Pre", "Justin - Post", "Sibley", "Corey", "Justin All"),
               sex = c("Male", "Male", "Female", "Male", "Male"),
               test = c(1, 2, 5, 10, 3),
               exam = c(5, 5, 100, 101, 10))

Is there any easy way to do it using dplyr / tidyr, or should I be prepared to do a multiple step journey?

CodePudding user response：

Here's an approach that sums every name based on its first word, keeps the groups with multiple observations (ie just Justin here), and collates that into the original data.

library(dplyr)
bind_rows(
  test,
  test %>%
    group_by(name = stringr::word(name, 1), sex) %>%
    summarize(across(everything(), sum), n = n(), .groups = "drop") %>%
    filter(n > 1) %>% select(-n) %>%
    mutate(name = paste(name, "- Total"))) %>% 
  arrange(name)


# A tibble: 5 × 4
  name           sex     test  exam
  <chr>          <chr>  <dbl> <dbl>
1 Corey          Male      10   101
2 Justin - Post  Male       2     5
3 Justin - Pre   Male       1     5
4 Justin - Total Male       3    10
5 Sibley         Female     5   100

Or a Justin-specific variation:

bind_rows(
  test,
  test %>%
    filter(name %>% stringr::str_starts("Justin")) %>%
    mutate(name = paste(stringr::word(name, 1), "- Total")) %>%
    group_by(name, sex) %>%
    summarize(across(everything(), sum), .groups = "drop")
) %>% 
  arrange(name)