Home > Mobile >  Calculate mean and sd for given variables in a dataframe
Calculate mean and sd for given variables in a dataframe

Time:12-22

Given a vector of names of numeric variables in a dataframe, I need to calculate mean and sd for each variable. For example, given the mtcars dataset and the following vector of variable names:

vars_to_transform <- c("mpg", "disp")

I'd like to have the following as result:

enter image description here

The first solution that came into my mind is the following:

library(dplyr)
library(purrr)

data("mtcars")

vars_to_transform <- c("mpg", "disp")

vars_to_transform %>% 
  map_dfr( function(x) { c(variable = x, avg = mean(mtcars[[x]], na.rm = T), sd = sd(mtcars[[x]], na.rm = T)) } )

The result is the following:

enter image description here

As you can see, all the returned variables are characters, but I expected to have numbers for avg and sd.

Is there a way to fix this? Or is there any better solution than this?

P.S. I'm using purr 0.3.4

CodePudding user response:

The following works (instead of using c() in your code, use tibble):

vars_to_transform %>% 
  map_dfr(~ tibble(variable = .x, avg = mean(mtcars[[.x]], na.rm = T), 
          sd = sd(mtcars[[.x]], na.rm = T))) 

Explanation: With c(), you are using a vector, whose elements must have the same type (character in your case, because variable is character). With tibble, one can have a different type per element.

@Gwang-Jin Kim suggests, in a comment bellow that I thank, one could also have used list instead of tibble.


Or try with adding type.convert:

library(dplyr)
library(purrr)

data("mtcars")

vars_to_transform <- c("mpg", "disp")

vars_to_transform %>% 
  map_dfr( function(x) { c(variable = x, avg = mean(mtcars[[x]], na.rm = T), sd = sd(mtcars[[x]], na.rm = T)) } ) %>% 
  type.convert(as.is=T)

#> # A tibble: 2 × 3
#>   variable   avg     sd
#>   <chr>    <dbl>  <dbl>
#> 1 mpg       20.1   6.03
#> 2 disp     231.  124.

CodePudding user response:

Seems like an overcomplicated way of doing select->pivot->group->summarise.

mtcars %>% 
    select(all_of(vars_to_transform)) %>%
    pivot_longer(everything()) %>% 
    group_by(name) %>% 
    summarise(
        mean = mean(value),
        sd = sd(value)
    )
# A tibble: 2 x 3
  name   mean     sd
  <chr> <dbl>  <dbl>
1 disp  231.  124.  
2 mpg    20.1   6.03

CodePudding user response:

Another option:

library(purrr)
library(dplyr)

vars_to_transform <- c("mpg", "disp")
funs <- lst(mean, sd)

mtcars %>%
  select(all_of(vars_to_transform)) %>%
  map_df(~ funs %>%
           map(exec, .x), .id = "var")

# A tibble: 2 x 3
  var    mean     sd
  <chr> <dbl>  <dbl>
1 mpg    20.1   6.03
2 disp  231.  124.  

CodePudding user response:

m <- mtcars %>% select(vars_to_transform)
tibble(variable = names(m), avg = apply(m, 2, mean), sd = apply(m, 2, sd))

## A tibble: 2 × 3
#  variable   avg     sd
#  <chr>    <dbl>  <dbl>
#1 mpg       20.1   6.03
#2 disp     231.  124.  
  • Related