I am attempting to reference a column in another dataframe (df2) based on a column of column names in my primary dataframe (df1$reference) to return the extact value where the ID and column name match in a new column (df1$new_col). Here is a simplified example of my data and the desired outcome:
I have tried rbind but ran into errors due to differences in the number of rows/columns. I also tried bind_rows/bind_cols, but am having a hard time joining only the referenced data (I would like to avoid a large join because my real data has many more columns). I feel like indexing would be able to accomplish this but I am not as familiar with indexing outside of simple tasks. I'm open to any and all suggestions / approaches!
CodePudding user response:
We may use row/column
indexing
DF1$new_col <- DF2[-1][cbind(seq_len(nrow(DF1)),
match(DF1$reference, names(DF2)[-1]))]
-output
> DF1
ID value reference new_col
1 1 4 colD no
2 2 5 colD no
3 3 6 colE no
data
DF1 <- structure(list(ID = 1:3, value = 4:6, reference = c("colD", "colD",
"colE")), class = "data.frame", row.names = c(NA, -3L))
DF2 <- structure(list(ID = 1:3, colD = c("no", "no", "yes"), colE = c("yes",
"no", "no"), colF = c("no", "yes", "no")),
class = "data.frame", row.names = c(NA,
-3L))
CodePudding user response:
Maybe something like this?
library(dplyr)
left_join(DF1, DF2, by="ID") %>%
mutate(New_col = case_when(reference=="colD" ~ colD,
reference=="colE" ~ colE,
reference=="colF" ~ colF)) %>%
select(ID, value, reference, New_col)
ID value reference New_col
1 1 4 colD no
2 2 5 colD no
3 3 6 colE no