for loop to function in R-CodePudding

I am trying to convert for loop to a function. The expected outcome is the Summ.Stats. Any help will be appreciated to get the expected outcome (Summ.Stats) in function format which is b.

CN = colnames(mtcars);CN
var <- c("vs", "am")

Summ.Stats <- NULL
library(psych)

for (i in 1:(length(var))) {
  
  temp <- which(CN == var[i])
  
  aux.0 <- mtcars  %>% filter(mtcars[,temp]==0)
  aux.1 <- mtcars %>% filter(mtcars[,temp]==1)
  fname.0 <- paste0(paste(var[i], "0", sep = "_"))
  fname.1 <- paste0(paste(var[i], "1", sep = "_"))
  Summ.0 <- describe(aux.0)
  Summ.1 <- describe(aux.1)
  tab <- round(cbind(Summ.0$mean, Summ.1$mean), 4)
  rownames(tab) <- colnames(aux.0)
  colnames(tab) <- c(fname.0, fname.1)
  Summ.Stats [[i]] <- tab
}
Summ.Stats #EXPECTED OUTCOME

What I tried is the following;

Summ.Stats <- NULL
my.function <- function(var, df){
  
  df <- df[, !sapply(df, is.character)]#REMOVE THE CHARACTER COLUMNS
  CN = colnames(df)  
  
  for (i in 1:length(var)) {
    temp <- which(CN == var[i])
    #
    res <- split(df, df[,temp])
    names(res) <- paste(var[i], names(res), sep = ".") 
    return(res) }
    
    for (j in 1:length(res)){
      tab <- describe(res[[j]]) #here the mean of res[[1]] and res[[2]] should be saved
      Summ.Stats [[j]] <- tab
      return(Summ.Stats)
    }
  }

b <- my.function(var, mtcars);b #only shows vs.0 and vs.1

CodePudding user response：

I'm being lazy here but on first look it could be because of where you've written return, try:

my.function <- function(var, df){
  
  df <- df[, !sapply(df, is.character)]#REMOVE THE CHARACTER COLUMNS
  CN = colnames(df)  
  
  for (i in 1:length(var)) {
    temp <- which(CN == var[i])
    #
    res <- split(df, df[,temp])
    names(res) <- paste(var[i], names(res), sep = ".") 
    }
    
    for (j in 1:length(res)){
      tab <- describe(res[[j]]) #here I need to save mean res[[1]] and res[[2]]
      Summ.Stats [[j]] <- tab
    } 
   return(Summ.Stats)
  }

CodePudding user response：

Some advices :

Be careful to return at the right places, returning at the end of each for loop is never right
Try to understand which objects you're growing, res is erased by the new value each time in your loop, don't program the next step before testing the previous one
Don't convert numeric, character, logical indices between each other when not needed
Don't loop on a numeric index when you can loop on a name, don't loop on a name if you can loop directly on items.
Learn to use lapply rather than for loops when possible
Use browser() in your function to understand what you're doing.
put the data argument first if possible (I don't do this below to reproduce your requested output)
Have fun :)

I think you want the following

my.function <- function(var, df){
  df <- Filter(is.numeric, df) 
  lapply(var, function(nm) {
    # browser() # uncomment, run and print objects to understand what these steps do
    split_data <- split(df, df[[nm]])
    cols <- lapply(split_data, function(x) psych::describe(x)["mean"])
    df <- do.call(cbind, cols)
    names(df) <- paste(nm, names(split_data), sep = ".")
    df
  })
}
my.function(c("vs", "am"), mtcars)
#> [[1]]
#>             vs.0       vs.1
#> mpg   16.6166667  24.557143
#> cyl    7.4444444   4.571429
#> disp 307.1500000 132.457143
#> hp   189.7222222  91.357143
#> drat   3.3922222   3.859286
#> wt     3.6885556   2.611286
#> qsec  16.6938889  19.333571
#> vs     0.0000000   1.000000
#> am     0.3333333   0.500000
#> gear   3.5555556   3.857143
#> carb   3.6111111   1.785714
#> 
#> [[2]]
#>             am.0        am.1
#> mpg   17.1473684  24.3923077
#> cyl    6.9473684   5.0769231
#> disp 290.3789474 143.5307692
#> hp   160.2631579 126.8461538
#> drat   3.2863158   4.0500000
#> wt     3.7688947   2.4110000
#> qsec  18.1831579  17.3600000
#> vs     0.3684211   0.5384615
#> am     0.0000000   1.0000000
#> gear   3.2105263   4.3846154
#> carb   2.7368421   2.9230769

^{Created on 2021-11-22 by the reprex package (v2.0.1)}

CodePudding user response：

Yet another approach - using some dots and avoiding the psych package altogether.

data <- mtcars

b <- function(df, ...){
  m <- function(y, z) df[df[[y]] == z,] |> colMeans()
  args <- as.list(match.call()[-c(1L, 2L)])
  lapply(args, \(.){
    cbind(m(., 0), m(., 1)) |> 
      (\(x) {colnames(x) <- paste0(c(., .), c("_1", "_2")) ; x})()
  }) |>
    lapply(round, 4)
}
b(data, vs, am)

#> [[1]]
#>          vs_1     vs_2
#> mpg   16.6167  24.5571
#> cyl    7.4444   4.5714
#> disp 307.1500 132.4571
#> hp   189.7222  91.3571
#> drat   3.3922   3.8593
#> wt     3.6886   2.6113
#> qsec  16.6939  19.3336
#> vs     0.0000   1.0000
#> am     0.3333   0.5000
#> gear   3.5556   3.8571
#> carb   3.6111   1.7857
#> 
#> [[2]]
#>          am_1     am_2
#> mpg   17.1474  24.3923
#> cyl    6.9474   5.0769
#> disp 290.3789 143.5308
#> hp   160.2632 126.8462
#> drat   3.2863   4.0500
#> wt     3.7689   2.4110
#> qsec  18.1832  17.3600
#> vs     0.3684   0.5385
#> am     0.0000   1.0000
#> gear   3.2105   4.3846
#> carb   2.7368   2.9231

Explanation - On inspecting psych::describe() it runs a bunch of functions that we don't need in the output. Instead, what we can do is make an auxiliary function m that subsets the data and computes the column means directly, preserving names. To work with arbitrary number of variables its usually a good idea to work with lists, i.e using lapply or purrr::map style approaches, which makes for concise syntax.