Home > Mobile >  R function applied to a list in a dataframe
R function applied to a list in a dataframe

Time:03-16

Here's my example dataframe:

data <- data.frame(matrix(NA, nrow = 5, ncol = 2))
colnames(data) <- c("Variables", "Value")

Xs <- combn(c("X1", "X2", "X3", "X4", "X5", "X6"), 4, simplify = FALSE)

for(j in 1:5)
{
  data$Variables[j] <- Xs[j]
  data$Value[j] <- sin(j)
}

That looks like this:

        Variables      Value
1  X1, X2, X3, X4  0.8414710
2  X1, X2, X3, X5  0.9092974
3  X1, X2, X3, X6  0.1411200
4  X1, X2, X4, X5 -0.7568025
5  X1, X2, X4, X6 -0.9589243

Where elements of column Variables are lists:

typeof(data$Variables[1])
[1] "list"

Now, I have a second dataframe:

vals <- data.frame(matrix(NA, nrow = 6, ncol = 2))
colnames(vals) <- c("Variable", "Val")

for(j in 1:6)
{
  vals$Variable[j] <- paste0("X", j, sep = "")
  vals$Val[j] <- cos(j)
}

Here's the output:

  Variable        Val
1       X1  0.5403023
2       X2 -0.4161468
3       X3 -0.9899925
4       X4 -0.6536436
5       X5  0.2836622
6       X6  0.9601703

My question is: I want to create another column of my data dataframe in such a way that, i.e. for the first row of data:

        Variables      Value
1  X1, X2, X3, X4  0.8414710

I apply a sum (or mean, min, max, weighted mean) function based on values of vals:

X1, X2, X3, X4 -> 0.5403023 - 0.4161468 - 0.9899925 - 0.6536436 = -1.5194806

And it results in a new column of data:

        Variables      Value      Sum
1  X1, X2, X3, X4  0.8414710  -1.5194806

Is this even possible?

CodePudding user response:

Yes, you can do:

data$Sum <- sapply(data$Variables, function(x) sum(vals$Val[match(x, vals$Variable)]))

data
#>        Variables      Value         Sum
#> 1 X1, X2, X3, X4  0.8414710 -1.51948065
#> 2 X1, X2, X3, X5  0.9092974 -0.58217484
#> 3 X1, X2, X3, X6  0.1411200  0.09433326
#> 4 X1, X2, X4, X5 -0.7568025 -0.24582597
#> 5 X1, X2, X4, X6 -0.9589243  0.43068214

CodePudding user response:

Or using tidyverse by creating a named vector (deframe) from 'vals' to match with the elements of 'Variable' and get the sum

library(dplyr)
library(tibble)
library(purrr)
 data %>% 
  mutate(Sum = map_dbl(Variables, ~ sum(deframe(vals)[.x])))
       Variables      Value         Sum
1 X1, X2, X3, X4  0.8414710 -1.51948065
2 X1, X2, X3, X5  0.9092974 -0.58217484
3 X1, X2, X3, X6  0.1411200  0.09433326
4 X1, X2, X4, X5 -0.7568025 -0.24582597
5 X1, X2, X4, X6 -0.9589243  0.43068214
  • Related