Home > Blockchain >  Calculate geometric mean by ID across entire long data frame in R
Calculate geometric mean by ID across entire long data frame in R

Time:09-08

In R, I am trying to calculate the geometric mean (exp(mean(log(x, na.rm=T))) across all columns in a data frame by participant ID. The data frame is in long format. Below is a comparable code that I have so far... it isn't working. I have also tried data.table, but still unsuccessful. Any help appreciated

 mtcars_sub <- mtcars[,1:2]
 mtcars_sub_gm <- mtcars_sub %>% 
                         group_by(cyl) %>% 
                              summarise_all(function (x) exp(mean(log(x, na.rm=TRUE))))  

 gm_vars <- names(mtcars_sub )[1] #this is very simplistic, but in my actual program there are  80 columns
 mtcars_sub_gm <- mtcars_sub [,lapply(.SD, function(x) {exp(mean(log(x, na.rm=T)))}), by = 
                             cyl, .SDcols = gm_vars] 

CodePudding user response:

I think the issue was related to the placement of the na.rm = TRUE, which should be a parameter of mean() but was placed within the log() parentheses.

library(dplyr)
mtcars[,1:5] %>% 
  group_by(cyl) %>% 
  summarize(across(everything(), ~exp(mean(log(.x), na.rm=TRUE))))

# A tibble: 3 × 5
    cyl   mpg  disp    hp  drat
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     4  26.3  102.  80.1  4.06
2     6  19.7  180. 121.   3.56
3     8  14.9  347. 204.   3.21
      

CodePudding user response:

You could also use a nested combination of sapply() to apply a function to multiple columns and ave() to apply that function to groups according to a reference column

mtcars_sub <- mtcars[,c(2,3,1)]

sapply(mtcars_sub[,c(2:3)], 
       FUN = function(x) ave(x, 
                             mtcars_sub[,c("cyl")], 
                             FUN = function(x) exp(mean(log(x),na.rm = TRUE))
                             )
       )
  • Related