Home > Software design >  Way to get all column names, count of how often each column name appears, and in which dataframe(s)
Way to get all column names, count of how often each column name appears, and in which dataframe(s)

Time:06-22

I am working with multiple datasets of survey responses from different years. If a certain column appears in more than 1 dataset, it will have the same name. Here is an example of what I'm looking for. Say these are the column names for my datasets (using 3 here for brevity)

d1 <- colnames(f2018) <- c("Institution", "Department", "Complete",
"effective_goals", "recs_open", "mostvaluable_open", "learningdifferences_l",
"studentmotivation_l")

d2 <- colnames(sum2015) <- c("Institution", "Department", "Complete", 
"effective_goals", "recs_open", "effective_tools", "learningdifferences_l")    

d3 <- <- c("Institution","Department", "Complete", 
"effective_goals", "effective_tools", "effective_assessment", "learningscience_freq")  

My goal is to have a resulting dataframe with 3 columns -- 1) every column name from all 3 datasets, a count of how often the column name occurs (should range from 1-3 in this example), (3) the dataframe(s) in which the particular column name can be found (e.g., d1, d2, d3).

So something like (not listing out all column names for x but to give somewhat of a replicable example here's an illustration)

x <- (c("Institution", "Department", "Complete",
"effective_goals", "recs_open", "mostvaluable_open", "learningdifferences_l",
"studentmotivation_l", "effective_tools")

y<- c("3", "3", "3", "3", "2", "1", "2", "1", "2")

z <- c("d1, d2, d3", "d1, d2, d3", "d1, d2, d3", "d1, d2, d3", "d1, d2", "d1",
"d1, d2", "d1", "d2, d3")

CodePudding user response:

A possible solution:

library(tidyverse)

data.frame(x = unique(c(d1, d2, d3))) %>% 
  mutate(
    apply(., 1, \(x) c(d1 = x %in% d1, d2 = x %in% d2, d3 = x %in% d3)) %>%
       t %>% as.data.frame,
    z = rowSums(across(-x)),
    across(c(-x,-z), ~ ifelse(.x, cur_column(), NA))) %>% 
  rowwise() %>% 
  mutate(y = c_across(d1:d3) %>%  na.omit %>%  str_c(collapse = ", ")) %>% 
  select(x, y, z) %>% 
  ungroup

#> # A tibble: 11 × 3
#>    x                     y              z
#>    <chr>                 <chr>      <dbl>
#>  1 Institution           d1, d2, d3     3
#>  2 Department            d1, d2, d3     3
#>  3 Complete              d1, d2, d3     3
#>  4 effective_goals       d1, d2, d3     3
#>  5 recs_open             d1, d2         2
#>  6 mostvaluable_open     d1             1
#>  7 learningdifferences_l d1, d2         2
#>  8 studentmotivation_l   d1             1
#>  9 effective_tools       d2, d3         2
#> 10 effective_assessment  d3             1
#> 11 learningscience_freq  d3             1
  •  Tags:  
  • r
  • Related