Summarize two dataframes in r-CodePudding

I have two dataframes

df1
#    var1 var2
# 1 X01    Red
# 2 X02    Green
# 3 X03    Red
# 4 X04    Yellow
# 5 X05    Red
# 6 X06    Green

df2
#   X01    X02    X03   ...
# 1 1      0.1    2.1
# 2 2      0.2    2.2
# 3 3      0.3    2.3
# 4 4      0.4    2.4
# 5 5      0.5    2.5
# 6 6      0.6    2.6

I am trying to get something like this

            
mean green  val1
mean red    val2
mean yellow val3

when mean is calculated with the corresponding variables.

CodePudding user response：

We could change match the column names from 'df2' with the keyval pair from 'df1' to replace with the color value, replicate by the colum index, unlist the 'df2' and use a group by approach to get the mean

tapply(unlist(df2), setNames(df1$var2, df1$var1)[names(df2)][col(df2)], 
    FUN = mean, na.rm = TRUE)

Or using tidyverse, reshape to 'long' and do a join before doing the group_by mean

library(dplyr)
library(tidyr)
df2 %>% 
 pivot_longer(cols = everything(), names_to = 'var1') %>% 
 left_join(df1) %>% 
 group_by(var2) %>% 
 summarise(value = mean(value, na.rm = TRUE), .groups = 'drop')

data

df1 <- structure(list(var1 = c("X01", "X02", "X03", "X04", "X05", "X06"
), var2 = c("Red", "Green", "Red", "Yellow", "Red", "Green")),
   class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

df2 <- structure(list(X01 = 1:6, X02 = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6
), X03 = c(2.1, 2.2, 2.3, 2.4, 2.5, 2.6)), 
  class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))