Home > Mobile >  Select columns based on column name matching another variable in a list of dataframes
Select columns based on column name matching another variable in a list of dataframes

Time:10-29

I have a list of 6 dataframes, which all contain the same column names. I would like to subset all 6 dataframes based on the column names matching that of a variable in another column (let's call that 'Index') but am getting stuck.

Example: 2 of the dataframes in the list (NB - the index is the same for each dataframe therefore only the same columns needs to be selected for each dataframe):

[[0]]
    a_1 b_1   c_1  a_2 b_2      c_2    Index
    3   red   no   2   yellow   yes    1

[[1]]
    a_1 b_1   c_1  a_2 b_2      c_2    Index
    3   red   no   2   yellow   yes    2

My desired output

[[0]]
    a_1 b_1   c_1     Index
    3   red   no       1

[[1]]
    a_2 b_2      c_2   Index
    2   yellow   yes   2

I tried the code

newlist<-lapply(samplelist,function(x) dplyr::select(ends_with(Index))) 

This generates the error "Error in is_character(match) : object 'Index' not found". I'm not sure how to best make this code work, or should I try a different approach altogether?

Update:

dput(samplelist)
    `1` = structure(list(ID = 12345, Com = structure(8296, class = "Date"), 
        NCom = structure(8533, class = "Date"), 
        a_1 = "Yes", b_1 = 160, c_1 = 160, d_1 = "No", 
        e_1 = 0, f_1 = "No", g_1 = 0, h_1 = "Yes", 
        a_2 = "Yes", b_2 = 155, c_2 = 155, d_2 = "No", 
        e_2 = 0, d_2 = "No", e_2 = 0, f_2 = "Yes", 
        Index = "1", Index_date = structure(9265, class = "Date")), row.names = 1L, class = "data.frame"))
    `2` = structure(list(Patient_ID = 22222, Com = structure(8296, class = "Date"), 
        NCom = structure(8533, class = "Date"), 
        a_1 = "Yes", b_1 = 160, c_1 = 160, d_1 = "No", 
        e_1 = 0, f_1 = "No", g_1 = 0, h_1 = "Yes", 
        a_2 = "Yes", b_2 = 155, c_2 = 155, d_2 = "No", 
        e_2 = 0, d_2 = "No", e_2 = 0, f_2 = "Yes", 
        Index = "2", Index_date = structure(8835, class = "Date")), row.names = 2L, class = "data.frame")) 

CodePudding user response:

I think you are close, try:

lapply(samplelist, function(x) select(x, ends_with(x[["Index"]]))) 

$`1`
  a_1 b_1 c_1 d_1 e_1 f_1 g_1 h_1
1 Yes 160 160  No   0  No   0 Yes

$`2`
  a_2 b_2 c_2 d_2 e_2
2 Yes 155 155  No   0

data

samplelist <- list(
`1` = structure(list(ID = 12345, Com = structure(8296, class = "Date"), 
                     NCom = structure(8533, class = "Date"), 
                     a_1 = "Yes", b_1 = 160, c_1 = 160, d_1 = "No", 
                     e_1 = 0, f_1 = "No", g_1 = 0, h_1 = "Yes", 
                     a_2 = "Yes", b_2 = 155, c_2 = 155, d_2 = "No", 
                     e_2 = 0,
                     Index = "1", Index_date = structure(9265, class = "Date")), row.names = 1L, class = "data.frame"),
`2` = structure(list(Patient_ID = 22222, Com = structure(8296, class = "Date"), 
                     NCom = structure(8533, class = "Date"), 
                     a_1 = "Yes", b_1 = 160, c_1 = 160, d_1 = "No", 
                     e_1 = 0, f_1 = "No", g_1 = 0, h_1 = "Yes", 
                     a_2 = "Yes", b_2 = 155, c_2 = 155, d_2 = "No", 
                     e_2 = 0,
                     Index = "2", Index_date = structure(8835, class = "Date")), row.names = 2L, class = "data.frame")) 

CodePudding user response:

Here is an option. Your data has duplicate column names, so I only select one of the dups.

library(tidyverse)

samplelist |>
  map(\(x){
    x <- tibble(x, .name_repair = make.unique)
    var <- pull(x, Index)
    select(x, ends_with(glue::glue("_{var}")), Index)
  })
#> $`1`
#> # A tibble: 1 x 9
#>   a_1     b_1   c_1 d_1     e_1 f_1     g_1 h_1   Index
#>   <chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr> <chr>
#> 1 Yes     160   160 No        0 No        0 Yes   1    
#> 
#> $`2`
#> # A tibble: 1 x 7
#>   a_2     b_2   c_2 d_2     e_2 f_2   Index
#>   <chr> <dbl> <dbl> <chr> <dbl> <chr> <chr>
#> 1 Yes     155   155 No        0 Yes   2
  • Related