Home > front end >  How to apply a function to every element in a dataframe in a list and return a dataframe in R?
How to apply a function to every element in a dataframe in a list and return a dataframe in R?

Time:09-08

I have a dataframe which looks like this example (just much larger):

Name <- c('Peter','Peter','Peter', 'Ben','Ben','Ben','Mary', 'Mary', 'Mary')
var1 <- c(0.4, 0.6, 0.7, 0.3, 0.9, 0.2, 0.4, 0.6, 0.7)
var2 <- c(0.5, 0.4, 0.2, 0.5, 0.4, 0.2, 0.1, 0.4, 0.2)
var3 <- c(0.2, 0.6, 0.9, 0.5, 0.5, 0.2, 0.5, 0.5, 0.2)
df <- data.frame(Name, var1, var2, var3)
df

I split my dataframe in order to apply a function to every group.

list_split= split(df[,2:4],df$Name)

my_list=vector("list",3)
for (i in seq_along(list_split)){
  my_list[[i]]=list(
    lapply(list_split[[i]],function(x) summary(x)))
} 

After that I wrote a function so that if the mean of the values in 'my_list' is larger than 0.9, the difference of the values in 'split_list' is taken, and otherwise just the value. (Please ignore that the operation does not make any sense, my original function is very different.):

l <- list()
    fun <- function(x,y) {ifelse(mean(x) > 0.9,diff(y),y)}
    for (j in seq_along(list_split)){
      for (i in seq_along(my_list)){
        u <- mapply(fun,my_list[[i]][[1]],list_split[[j]], SIMPLIFY = FALSE)
        l[[j]] <- u
      }
    }

I want that the function is applied to all values of the 'var's in the dataframes in 'list_split'. For example for list_split[["Ben"]] the values are:

var1 var2 var3
4  0.3  0.5  0.5
5  0.9  0.4  0.5
6  0.2  0.2  0.2

But it is just applied to the first value of every 'var', so that the resulting list for the first element looks like this:

l[[1]]
$var1
[1] 0.3

$var2
[1] 0.5

$var3
[1] 0.5

So how can I apply the function to all values in every 'list_split' element and end up with a list that exactly preserves the structure of 'list_split', that is a list of dataframes?

Thank you!

CodePudding user response:

We could try

Map(\(x, y) {
   x[] <- Map(\(u, v) if(mean(v) > 0.9) c(NA, diff(u)) else u, x, y)
   x
    }, list_split, lapply(my_list, \(x) do.call("c", x)))

-output

$Ben
  var1 var2 var3
4  0.3  0.5  0.5
5  0.9  0.4  0.5
6  0.2  0.2  0.2

$Mary
  var1 var2 var3
7  0.4  0.1  0.5
8  0.6  0.4  0.5
9  0.7  0.2  0.2

$Peter
  var1 var2 var3
1  0.4  0.5  0.2
2  0.6  0.4  0.6
3  0.7  0.2  0.9

CodePudding user response:

Here's one approach:

l <- as.list(names(list_split))

fun <- function(x,y) {ifelse(x > 0.9, y-x, y)}
for (j in seq_along(list_split)){
  df2 <- df2[0,]
  df2 <- data.frame(matrix(ncol = 3, nrow = 3))
  names(df2) <- c("var1", "var2", "var3")
  for (i in seq_along(list_split[[j]])){
    for (h in seq_along(list_split[[j]][[i]])){
      u <- mapply(fun,my_list[[j]][[1]][[i]][[4]],list_split[[j]][[i]][[h]], SIMPLIFY = FALSE)
      df2[[i]][[h]] <- u
    }
  }
  l[[j]] <- df2
}

names(l) <- names(list_split)
l

This gives:

$Ben
  var1 var2 var3
4  0.3  0.5  0.5
5  0.9  0.4  0.5
6  0.2  0.2  0.2

$Mary
  var1 var2 var3
7  0.4  0.1  0.5
8  0.6  0.4  0.5
9  0.7  0.2  0.2

$Peter
  var1 var2 var3
1  0.4  0.5  0.2
2  0.6  0.4  0.6
3  0.7  0.2  0.9
  • Related