In R, I am trying to calculate the geometric mean (exp(mean(log(x, na.rm=T))) across all columns in a data frame by participant ID. The data frame is in long format. Below is a comparable code that I have so far... it isn't working. I have also tried data.table, but still unsuccessful. Any help appreciated
mtcars_sub <- mtcars[,1:2]
mtcars_sub_gm <- mtcars_sub %>%
group_by(cyl) %>%
summarise_all(function (x) exp(mean(log(x, na.rm=TRUE))))
gm_vars <- names(mtcars_sub )[1] #this is very simplistic, but in my actual program there are 80 columns
mtcars_sub_gm <- mtcars_sub [,lapply(.SD, function(x) {exp(mean(log(x, na.rm=T)))}), by =
cyl, .SDcols = gm_vars]
CodePudding user response:
I think the issue was related to the placement of the na.rm = TRUE
, which should be a parameter of mean()
but was placed within the log()
parentheses.
library(dplyr)
mtcars[,1:5] %>%
group_by(cyl) %>%
summarize(across(everything(), ~exp(mean(log(.x), na.rm=TRUE))))
# A tibble: 3 × 5
cyl mpg disp hp drat
<dbl> <dbl> <dbl> <dbl> <dbl>
1 4 26.3 102. 80.1 4.06
2 6 19.7 180. 121. 3.56
3 8 14.9 347. 204. 3.21
CodePudding user response:
You could also use a nested combination of sapply()
to apply a function to multiple columns and ave()
to apply that function to groups according to a reference column
mtcars_sub <- mtcars[,c(2,3,1)]
sapply(mtcars_sub[,c(2:3)],
FUN = function(x) ave(x,
mtcars_sub[,c("cyl")],
FUN = function(x) exp(mean(log(x),na.rm = TRUE))
)
)