I have a data frame (A) with a column containing some info. I have a larger data frame (B) that contains a column with similar information and I need to detect which column that contains the same data as the column in dataframeA. Because the dataframeB is large, it will be time-consuming to manually look through it though to identify the column. Is there a way that I can use the information from column 'some_info' in DataframeA to find the corresponding column in DataframeB where the information is contained?
dataframeA <- data.frame(some_info = c("a","b","c","d","e") )
dataframeB <- data.frame(id = 1:8, column_to_be_identified = c("a","f","b","c","g", "d","h", "e"), "column_almost_similar_but_not_quite" =c("a","f","b","c","g", "3","h", "e") )
Basically: Is it possible to create a function or something similar that looks through dataframeB and detects the column(s) that contains exactly the information from the column in dataframeA?
Thanks a lot in advance!
CodePudding user response:
If I understand correctly and you just want to receive the column name:
dataframeA <- data.frame(some_info = c("a","b","c","d","e") )
dataframeB <- data.frame(id = 1:8,
column_to_be_identified = c("a","f","b","c","g", "d","h", "e"),
column_almost_similar_but_not_quite = c("a","f","b","c","g", "3","h", "e") )
relevant_column_name <- names(
which(
# iterate over all columns
sapply(dataframeB, function(x) {
# unique is more efficient for large vectors
x <- unique(x)
# are all values of the target vector in the column
all(dataframeA$some_info %in% x)
})))
relevant_column_name
#> [1] "column_to_be_identified"
CodePudding user response:
With select
from dplyr
we can do this
library(dplyr)
dataframeB %>%
select(where(~ is.character(.) &&
all(dataframeA$some_info %in% .))) %>%
names
[1] "column_to_be_identified"