Home > Mobile >  In R, what's the easiest way to sum up two groups across all numeric variables and create a new
In R, what's the easiest way to sum up two groups across all numeric variables and create a new

Time:10-29

I have a dataframe of variables, and though I want to keep all the raw data, I also want to make a new row that's the summation of all that start with "Justin -" called "Justin - All". I can think of ways of doing this that take many steps, but I'm wondering if there's some parsimonous way to make this transition.

Here's the raw data:

library(dplyr)
test <- tibble(name = c("Justin - Pre", "Justin - Post", "Sibley", "Corey"),
               sex = c("Male", "Male", "Female", "Male"),
               test = c(1, 2, 5, 10),
               exam = c(5, 5, 100, 101))

outcome <- tibble(name = c("Justin - Pre", "Justin - Post", "Sibley", "Corey", "Justin All"),
               sex = c("Male", "Male", "Female", "Male", "Male"),
               test = c(1, 2, 5, 10, 3),
               exam = c(5, 5, 100, 101, 10))

Is there any easy way to do it using dplyr / tidyr, or should I be prepared to do a multiple step journey?

CodePudding user response:

Here's an approach that sums every name based on its first word, keeps the groups with multiple observations (ie just Justin here), and collates that into the original data.

library(dplyr)
bind_rows(
  test,
  test %>%
    group_by(name = stringr::word(name, 1), sex) %>%
    summarize(across(everything(), sum), n = n(), .groups = "drop") %>%
    filter(n > 1) %>% select(-n) %>%
    mutate(name = paste(name, "- Total"))) %>% 
  arrange(name)


# A tibble: 5 × 4
  name           sex     test  exam
  <chr>          <chr>  <dbl> <dbl>
1 Corey          Male      10   101
2 Justin - Post  Male       2     5
3 Justin - Pre   Male       1     5
4 Justin - Total Male       3    10
5 Sibley         Female     5   100

Or a Justin-specific variation:

bind_rows(
  test,
  test %>%
    filter(name %>% stringr::str_starts("Justin")) %>%
    mutate(name = paste(stringr::word(name, 1), "- Total")) %>%
    group_by(name, sex) %>%
    summarize(across(everything(), sum), .groups = "drop")
) %>% 
  arrange(name)
  • Related