Home > Enterprise >  Creating aggregated tables with calculations using loops in R
Creating aggregated tables with calculations using loops in R

Time:05-23

Wondering how I can create a table with the results of different calculations in R. Here's an example using mtcars df


# Load tidyverse
library(tidyverse)

# Select only cyl vs and am from mtcars for simplicity
mt_select <- select(mtcars, c(cyl, vs, am))


# Here is the table I wish to make using some type of looping function
# (I have like 40 variables in actual dataset)
freq_table <- mt1 %>% group_by(cyl) %>% 
  summarise(n = n(),
            vs_sum = sum(vs),
            vs_perc = sum(vs)/n*100,
            am_sum = sum(am),
            am_perc = sum(am)/n*100)
print(freq_table)

Here is my attempt. Main issues I can't figure out:

  1. Includes sums for "0" answers,
  2. I don't know how to add a percent column
  3. Don't know how to merge these all into one table
# make a vector to loop through
mt_vars <- names(mt_select)

# Loop to make tables
for (i in mt_vars){
  mt_select %>% 
    group_by(cyl) %>% 
    count_(i) %>%
    print()
}

Been trying to figure out how to make this for months and always just decide I don't need the table, or something else. Any help is greatly appreciated!!

CodePudding user response:

You haven't provided an expected output, but I think the key to solving your problem might be to pivot your data into long format. This will probably allow you to do what you want without any need for loops. For example, taking mtcars as input:

library(tidyverse)

mtcars %>% 
  pivot_longer(everything()) %>% 
  group_by(name) %>% 
  summarise(
    n=n(),
    valueSum=sum(value),
    valuePct=valueSum/(n*100)
  )
# A tibble: 11 × 4
   name      n valueSum valuePct
   <chr> <int>    <dbl>    <dbl>
 1 am       32      13   0.00406
 2 carb     32      90   0.0281 
 3 cyl      32     198   0.0619 
 4 disp     32    7383.  2.31   
 5 drat     32     115.  0.0360 
 6 gear     32     118   0.0369 
 7 hp       32    4694   1.47   
 8 mpg      32     643.  0.201  
 9 qsec     32     571.  0.178  
10 vs       32      14   0.00438
11 wt       32     103.  0.0322 

Does that come close to what you want? If you want to work on only a subset of columns, either substitute what you need in place of everything(), filter after pivoting or select before pivoting.

Also, I'm not sure what sort of a percentage you're trying to calculate.

value and name are the default values of names_to and values_to in pivot_longer().

  • Related