Home > Net >  More extensive summary with group_by
More extensive summary with group_by

Time:12-25

I have a dataset containing COVID-19 patients with vaccination status and whether they're dead or alive.

ID <- c(1:20)
Group <- c("1. vacc   unvacc", "2. vacc", "3. vacc", "1. vacc   unvacc", "2. vacc", "3. vacc", "1. vacc   unvacc", "2. vacc", "3. vacc",
           "1. vacc   unvacc", "2. vacc", "3. vacc", "1. vacc   unvacc", "2. vacc", "3. vacc", "1. vacc   unvacc", "2. vacc", "3. vacc",
           "1. vacc   unvacc", "2. vacc")
Status <- c("Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", 
            "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive")

df <- data.frame(ID, Group, Status)

So far, I've tried to make a code, and I can come as far as this:

library(tidyverse)

df_organ %>% 
  mutate_at("Group", as.character) %>%
  list(group_by(.,Group, Status), .) %>%
  map(~summarize(.,cnt = n())) %>%
  bind_rows() %>%
  replace_na(list(Group="Overall"))

Giving me the output:

    `summarise()` has grouped output by 'Group'. You can override using the `.groups` argument.
# A tibble: 7 x 3
# Groups:   Group [4]
  Group            Status   cnt
  <chr>            <chr>  <int>
1 1. vacc   unvacc Alive      3
2 1. vacc   unvacc Dead       4
3 2. vacc          Alive      4
4 2. vacc          Dead       3
5 3. vacc          Alive      3
6 3. vacc          Dead       3
7 Overall          NA        20

The output I'm looking for is this:

    `summarise()` has grouped output by 'Group'. You can override using the `.groups` argument.
    # A tibble: 10 x 3
    # Groups:   Group [4]
      Group            Status   cnt
      <chr>            <chr>  <int>
    1 1. vacc   unvacc Alive      3
    2 1. vacc   unvacc Dead       4
    3 1. uvac   unvacc All        7
    4 2. vacc          Alive      4
    5 2. vacc          Dead       3
    6 2. vacc          All        7
    5 3. vacc          Alive      3
    6 3. vacc          Dead       3
    7 3. vacc          All        6
    8 Overall          Alive     10
    9 Overall          Dead      10
   10 Overall          All       20 

CodePudding user response:

We could do it this way:

  1. First we count. We use count function from dplyr. The good thing about count is that it inherits group_by and summarise.
  2. Then we make wide format with pivot_wider from tidyr package
  3. Next we use handy janitor package to get rowsums and colsums. (We could do this also with base ...)
  4. Then get back to long format with renaming the columns
library(dplyr)
library(tidyr)
library(janitor)

df %>% 
  count(Group, Status) %>% 
  pivot_wider(
    names_from = Status,
    values_from = n
  ) %>% 
  adorn_totals("col", name = "All") %>% 
  adorn_totals("row", name = "Ovreall") %>% 
  pivot_longer(
    cols= -Group,
    names_to = "Status", 
    values_to = "cnt"
  )
   Group            Status   cnt
   <chr>            <chr>  <dbl>
 1 1. vacc   unvacc Alive      3
 2 1. vacc   unvacc Dead       4
 3 1. vacc   unvacc All        7
 4 2. vacc          Alive      4
 5 2. vacc          Dead       3
 6 2. vacc          All        7
 7 3. vacc          Alive      3
 8 3. vacc          Dead       3
 9 3. vacc          All        6
10 Ovreall          Alive     10
11 Ovreall          Dead      10
12 Ovreall          All       20
  • Related