I have a dataframe in R where one column (say column A) names other columns. For example,
colA colB colC colD
'colB' 1 2 3
'colD' 4 5 6
I want to create a new column where I extract the value in the column named by column A. For example, in the above table, I would like to create a new column whose value is 1 in the first row and 6 in the second row. What is the best way to automate the construction of such a column? I would especially appreciate if there is some way to do this within the tidyverse, but would be open to other solutions if those are considered better.
CodePudding user response:
We can use match
to get the column index from 'colA' with the column names of the dataset, create a matrix with cbind
ing the row index and extract based on row/column index
df1$new <- df1[-1][cbind(seq_len(nrow(df1)), match(df1$colA, names(df1)[-1]))]
-output
> df1
colA colB colC colD new
1 colB 1 2 3 1
2 colD 4 5 6 6
This should be more efficient as we are using indexing in a vectorized way
Using dplyr
, we can rowwise
and extract the column value with [[
library(dplyr)
df1 %>%
rowwise %>%
mutate(new = cur_data()[[colA]]) %>%
ungroup
# A tibble: 2 × 5
colA colB colC colD new
<chr> <int> <int> <int> <int>
1 colB 1 2 3 1
2 colD 4 5 6 6
Or create the row/column indexing as in the base R
solution
df1 %>%
mutate(new = .[-1][cbind(row_number(), match(colA, names(df1)[-1]))])
colA colB colC colD new
1 colB 1 2 3 1
2 colD 4 5 6 6
data
df1 <- structure(list(colA = c("colB", "colD"), colB = c(1L, 4L), colC = c(2L,
5L), colD = c(3L, 6L)), class = "data.frame", row.names = c(NA,
-2L))
CodePudding user response:
You could also do:
rownames(df1) <- seq(nrow(df1))
transform(df1, new = df1[-1][cbind(rownames(df1), colA)])
colA colB colC colD new
1 colB 1 2 3 1
2 colD 4 5 6 6
Also:
df1 %>%left_join(pivot_longer(., -colA) %>%
filter(name == colA) %>%
select(-name))
colA colB colC colD value
1 colB 1 2 3 1
2 colD 4 5 6 6
CodePudding user response:
A simple one-line solution from Base R:
df1$value <- df1[outer(df1$colA, colnames(df1), FUN = "==")]
df1
#> colA colB colC colD value
#> 1 colB 1 2 3 1
#> 2 colD 4 5 6 6
The idea is to use outer
to create an indexing matrix that filters the values you need.