I am working with multiple datasets of survey responses from different years. If a certain column appears in more than 1 dataset, it will have the same name. Here is an example of what I'm looking for. Say these are the column names for my datasets (using 3 here for brevity)
d1 <- colnames(f2018) <- c("Institution", "Department", "Complete",
"effective_goals", "recs_open", "mostvaluable_open", "learningdifferences_l",
"studentmotivation_l")
d2 <- colnames(sum2015) <- c("Institution", "Department", "Complete",
"effective_goals", "recs_open", "effective_tools", "learningdifferences_l")
d3 <- <- c("Institution","Department", "Complete",
"effective_goals", "effective_tools", "effective_assessment", "learningscience_freq")
My goal is to have a resulting dataframe with 3 columns -- 1) every column name from all 3 datasets, a count of how often the column name occurs (should range from 1-3 in this example), (3) the dataframe(s) in which the particular column name can be found (e.g., d1, d2, d3).
So something like (not listing out all column names for x but to give somewhat of a replicable example here's an illustration)
x <- (c("Institution", "Department", "Complete",
"effective_goals", "recs_open", "mostvaluable_open", "learningdifferences_l",
"studentmotivation_l", "effective_tools")
y<- c("3", "3", "3", "3", "2", "1", "2", "1", "2")
z <- c("d1, d2, d3", "d1, d2, d3", "d1, d2, d3", "d1, d2, d3", "d1, d2", "d1",
"d1, d2", "d1", "d2, d3")
CodePudding user response:
A possible solution:
library(tidyverse)
data.frame(x = unique(c(d1, d2, d3))) %>%
mutate(
apply(., 1, \(x) c(d1 = x %in% d1, d2 = x %in% d2, d3 = x %in% d3)) %>%
t %>% as.data.frame,
z = rowSums(across(-x)),
across(c(-x,-z), ~ ifelse(.x, cur_column(), NA))) %>%
rowwise() %>%
mutate(y = c_across(d1:d3) %>% na.omit %>% str_c(collapse = ", ")) %>%
select(x, y, z) %>%
ungroup
#> # A tibble: 11 × 3
#> x y z
#> <chr> <chr> <dbl>
#> 1 Institution d1, d2, d3 3
#> 2 Department d1, d2, d3 3
#> 3 Complete d1, d2, d3 3
#> 4 effective_goals d1, d2, d3 3
#> 5 recs_open d1, d2 2
#> 6 mostvaluable_open d1 1
#> 7 learningdifferences_l d1, d2 2
#> 8 studentmotivation_l d1 1
#> 9 effective_tools d2, d3 2
#> 10 effective_assessment d3 1
#> 11 learningscience_freq d3 1