Home > OS >  Dplyr solution to adding row and column tally sums to a cross table
Dplyr solution to adding row and column tally sums to a cross table

Time:01-29

I would like to prepare a cross table with sums of rows and columns when the first column has non-numeric data (as it is acting as the row name). The example R code below produces what I'm seeking in terms of the sum of the tally of rows.

mtcars %>%
  as.tibble(.) %>%                                      # make as tibble
  mutate(cyl = factor(cyl), gear = factor(gear)) %>%    # make some factor vars
  select(cyl, gear, mpg) %>%                            # select demo data
  group_by(cyl, gear) %>%                               # two groupings
  tally() %>%                                           # compute tallys
  spread(cyl, n) %>%                                    # make cross table 
  mutate(RowCount = rowSums(.[ ,-1], na.rm = TRUE))     # compute row sums

This produces

A tibble: 3 × 5
  gear    `4`   `6`   `8` RowCount
  <fct> <int> <int> <int>    <dbl>
1 3         1     2    12       15
2 4         8     4    NA       12
3 5         2     1     2        5

However, I'm struggling to find the syntax to give me the sum of the four columns 4, 6, 8 and "RowCount", and add it as an extra line to the tibble as a new row 4, presumably with an NA as the "gear" sum. Note the end purpose here is formatting the data for direct input into a nice table.

CodePudding user response:

Here is an approach using bind_rows

library(dplyr)
library(tidyr)

mtcars %>% 
  group_by(gear, cyl) %>% 
  summarize(n = n(), .groups="drop") %>% 
  pivot_wider(names_from=cyl, values_from=n) %>% 
  mutate(RowCount = rowSums(across(`4`:`8`), na.rm=T)) %>% 
  bind_rows(., colSums(.[-1], na.rm = T))
# A tibble: 4 × 5
   gear   `4`   `6`   `8` RowCount
  <dbl> <dbl> <dbl> <dbl>    <dbl>
1     3     1     2    12       15
2     4     8     4    NA       12
3     5     2     1     2        5
4    NA    11     7    14       32

CodePudding user response:

You can try something like this. Add a final rbind that just sum your desired columns:

mtcars %>%
  tibble(.) %>%                                      # make as tibble
  mutate(cyl = factor(cyl), gear = factor(gear)) %>%    # make some factor vars
  select(cyl, gear, mpg) %>%                            # select demo data
  group_by(cyl, gear) %>%                               # two groupings
  tally() %>%                                           # compute tallys
  spread(cyl, n) %>%                                    # make cross table 
  mutate(RowCount = rowSums(.[ ,-1], na.rm = TRUE)) %>%  # compute row sums
  rbind(c(as.factor(NA), colSums(.[ ,-1], na.rm = TRUE)))

Output:

# A tibble: 4 × 5
  gear    `4`   `6`   `8` RowCount
  <fct> <dbl> <dbl> <dbl>    <dbl>
1 3         1     2    12       15
2 4         8     4    NA       12
3 5         2     1     2        5
4 NA       11     7    14       32

CodePudding user response:

Please check below code with rowSums

mtcars2 <- mtcars
mtcars2$gear <- NA 

bind_rows(mtcars,mtcars2) %>% group_by(gear, cyl) %>% count() %>% 
  pivot_wider(gear, names_from = cyl, values_from = n) %>% 
  mutate(rowcount=rowSums(across(!starts_with('gear')), na.rm=T))

Created on 2023-01-28 with reprex v2.0.2

# A tibble: 4 × 5
# Groups:   gear [4]
   gear   `4`   `6`   `8` rowcount
  <dbl> <int> <int> <int>    <dbl>
1     3     1     2    12       15
2     4     8     4    NA       12
3     5     2     1     2        5
4    NA    11     7    14       32

  • Related