I have a dataframe of variables, and though I want to keep all the raw data, I also want to make a new row that's the summation of all that start with "Justin -" called "Justin - All". I can think of ways of doing this that take many steps, but I'm wondering if there's some parsimonous way to make this transition.
Here's the raw data:
library(dplyr)
test <- tibble(name = c("Justin - Pre", "Justin - Post", "Sibley", "Corey"),
sex = c("Male", "Male", "Female", "Male"),
test = c(1, 2, 5, 10),
exam = c(5, 5, 100, 101))
outcome <- tibble(name = c("Justin - Pre", "Justin - Post", "Sibley", "Corey", "Justin All"),
sex = c("Male", "Male", "Female", "Male", "Male"),
test = c(1, 2, 5, 10, 3),
exam = c(5, 5, 100, 101, 10))
Is there any easy way to do it using dplyr / tidyr, or should I be prepared to do a multiple step journey?
CodePudding user response:
Here's an approach that sums every name based on its first word, keeps the groups with multiple observations (ie just Justin here), and collates that into the original data.
library(dplyr)
bind_rows(
test,
test %>%
group_by(name = stringr::word(name, 1), sex) %>%
summarize(across(everything(), sum), n = n(), .groups = "drop") %>%
filter(n > 1) %>% select(-n) %>%
mutate(name = paste(name, "- Total"))) %>%
arrange(name)
# A tibble: 5 × 4
name sex test exam
<chr> <chr> <dbl> <dbl>
1 Corey Male 10 101
2 Justin - Post Male 2 5
3 Justin - Pre Male 1 5
4 Justin - Total Male 3 10
5 Sibley Female 5 100
Or a Justin-specific variation:
bind_rows(
test,
test %>%
filter(name %>% stringr::str_starts("Justin")) %>%
mutate(name = paste(stringr::word(name, 1), "- Total")) %>%
group_by(name, sex) %>%
summarize(across(everything(), sum), .groups = "drop")
) %>%
arrange(name)