Way to get all column names, count of how often each column name appears, and in which dataframe(s)-CodePudding

I am working with multiple datasets of survey responses from different years. If a certain column appears in more than 1 dataset, it will have the same name. Here is an example of what I'm looking for. Say these are the column names for my datasets (using 3 here for brevity)

d1 <- colnames(f2018) <- c("Institution", "Department", "Complete",
"effective_goals", "recs_open", "mostvaluable_open", "learningdifferences_l",
"studentmotivation_l")

d2 <- colnames(sum2015) <- c("Institution", "Department", "Complete", 
"effective_goals", "recs_open", "effective_tools", "learningdifferences_l")    

d3 <- <- c("Institution","Department", "Complete", 
"effective_goals", "effective_tools", "effective_assessment", "learningscience_freq")

My goal is to have a resulting dataframe with 3 columns -- 1) every column name from all 3 datasets, a count of how often the column name occurs (should range from 1-3 in this example), (3) the dataframe(s) in which the particular column name can be found (e.g., d1, d2, d3).

So something like (not listing out all column names for x but to give somewhat of a replicable example here's an illustration)

x <- (c("Institution", "Department", "Complete",
"effective_goals", "recs_open", "mostvaluable_open", "learningdifferences_l",
"studentmotivation_l", "effective_tools")

y<- c("3", "3", "3", "3", "2", "1", "2", "1", "2")

z <- c("d1, d2, d3", "d1, d2, d3", "d1, d2, d3", "d1, d2, d3", "d1, d2", "d1",
"d1, d2", "d1", "d2, d3")

CodePudding user response：

A possible solution:

library(tidyverse)

data.frame(x = unique(c(d1, d2, d3))) %>% 
  mutate(
    apply(., 1, \(x) c(d1 = x %in% d1, d2 = x %in% d2, d3 = x %in% d3)) %>%
       t %>% as.data.frame,
    z = rowSums(across(-x)),
    across(c(-x,-z), ~ ifelse(.x, cur_column(), NA))) %>% 
  rowwise() %>% 
  mutate(y = c_across(d1:d3) %>%  na.omit %>%  str_c(collapse = ", ")) %>% 
  select(x, y, z) %>% 
  ungroup

#> # A tibble: 11 × 3
#>    x                     y              z
#>    <chr>                 <chr>      <dbl>
#>  1 Institution           d1, d2, d3     3
#>  2 Department            d1, d2, d3     3
#>  3 Complete              d1, d2, d3     3
#>  4 effective_goals       d1, d2, d3     3
#>  5 recs_open             d1, d2         2
#>  6 mostvaluable_open     d1             1
#>  7 learningdifferences_l d1, d2         2
#>  8 studentmotivation_l   d1             1
#>  9 effective_tools       d2, d3         2
#> 10 effective_assessment  d3             1
#> 11 learningscience_freq  d3             1