I have two dataframes
df1
# var1 var2
# 1 X01 Red
# 2 X02 Green
# 3 X03 Red
# 4 X04 Yellow
# 5 X05 Red
# 6 X06 Green
df2
# X01 X02 X03 ...
# 1 1 0.1 2.1
# 2 2 0.2 2.2
# 3 3 0.3 2.3
# 4 4 0.4 2.4
# 5 5 0.5 2.5
# 6 6 0.6 2.6
I am trying to get something like this
mean green val1
mean red val2
mean yellow val3
when mean is calculated with the corresponding variables.
CodePudding user response:
We could change match the column names from 'df2' with the keyval pair from 'df1' to replace with the color value, rep
licate by the col
um index, unlist
the 'df2' and use a group by approach to get the mean
tapply(unlist(df2), setNames(df1$var2, df1$var1)[names(df2)][col(df2)],
FUN = mean, na.rm = TRUE)
Or using tidyverse
, reshape to 'long' and do a join before doing the group_by
mean
library(dplyr)
library(tidyr)
df2 %>%
pivot_longer(cols = everything(), names_to = 'var1') %>%
left_join(df1) %>%
group_by(var2) %>%
summarise(value = mean(value, na.rm = TRUE), .groups = 'drop')
data
df1 <- structure(list(var1 = c("X01", "X02", "X03", "X04", "X05", "X06"
), var2 = c("Red", "Green", "Red", "Yellow", "Red", "Green")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
df2 <- structure(list(X01 = 1:6, X02 = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6
), X03 = c(2.1, 2.2, 2.3, 2.4, 2.5, 2.6)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))