I have a list of 6 dataframes, which all contain the same column names. I would like to subset all 6 dataframes based on the column names matching that of a variable in another column (let's call that 'Index') but am getting stuck.
Example: 2 of the dataframes in the list (NB - the index is the same for each dataframe therefore only the same columns needs to be selected for each dataframe):
[[0]]
a_1 b_1 c_1 a_2 b_2 c_2 Index
3 red no 2 yellow yes 1
[[1]]
a_1 b_1 c_1 a_2 b_2 c_2 Index
3 red no 2 yellow yes 2
My desired output
[[0]]
a_1 b_1 c_1 Index
3 red no 1
[[1]]
a_2 b_2 c_2 Index
2 yellow yes 2
I tried the code
newlist<-lapply(samplelist,function(x) dplyr::select(ends_with(Index)))
This generates the error "Error in is_character(match) : object 'Index' not found". I'm not sure how to best make this code work, or should I try a different approach altogether?
Update:
dput(samplelist)
`1` = structure(list(ID = 12345, Com = structure(8296, class = "Date"),
NCom = structure(8533, class = "Date"),
a_1 = "Yes", b_1 = 160, c_1 = 160, d_1 = "No",
e_1 = 0, f_1 = "No", g_1 = 0, h_1 = "Yes",
a_2 = "Yes", b_2 = 155, c_2 = 155, d_2 = "No",
e_2 = 0, d_2 = "No", e_2 = 0, f_2 = "Yes",
Index = "1", Index_date = structure(9265, class = "Date")), row.names = 1L, class = "data.frame"))
`2` = structure(list(Patient_ID = 22222, Com = structure(8296, class = "Date"),
NCom = structure(8533, class = "Date"),
a_1 = "Yes", b_1 = 160, c_1 = 160, d_1 = "No",
e_1 = 0, f_1 = "No", g_1 = 0, h_1 = "Yes",
a_2 = "Yes", b_2 = 155, c_2 = 155, d_2 = "No",
e_2 = 0, d_2 = "No", e_2 = 0, f_2 = "Yes",
Index = "2", Index_date = structure(8835, class = "Date")), row.names = 2L, class = "data.frame"))
CodePudding user response:
I think you are close, try:
lapply(samplelist, function(x) select(x, ends_with(x[["Index"]])))
$`1`
a_1 b_1 c_1 d_1 e_1 f_1 g_1 h_1
1 Yes 160 160 No 0 No 0 Yes
$`2`
a_2 b_2 c_2 d_2 e_2
2 Yes 155 155 No 0
data
samplelist <- list(
`1` = structure(list(ID = 12345, Com = structure(8296, class = "Date"),
NCom = structure(8533, class = "Date"),
a_1 = "Yes", b_1 = 160, c_1 = 160, d_1 = "No",
e_1 = 0, f_1 = "No", g_1 = 0, h_1 = "Yes",
a_2 = "Yes", b_2 = 155, c_2 = 155, d_2 = "No",
e_2 = 0,
Index = "1", Index_date = structure(9265, class = "Date")), row.names = 1L, class = "data.frame"),
`2` = structure(list(Patient_ID = 22222, Com = structure(8296, class = "Date"),
NCom = structure(8533, class = "Date"),
a_1 = "Yes", b_1 = 160, c_1 = 160, d_1 = "No",
e_1 = 0, f_1 = "No", g_1 = 0, h_1 = "Yes",
a_2 = "Yes", b_2 = 155, c_2 = 155, d_2 = "No",
e_2 = 0,
Index = "2", Index_date = structure(8835, class = "Date")), row.names = 2L, class = "data.frame"))
CodePudding user response:
Here is an option. Your data has duplicate column names, so I only select one of the dups.
library(tidyverse)
samplelist |>
map(\(x){
x <- tibble(x, .name_repair = make.unique)
var <- pull(x, Index)
select(x, ends_with(glue::glue("_{var}")), Index)
})
#> $`1`
#> # A tibble: 1 x 9
#> a_1 b_1 c_1 d_1 e_1 f_1 g_1 h_1 Index
#> <chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr> <chr>
#> 1 Yes 160 160 No 0 No 0 Yes 1
#>
#> $`2`
#> # A tibble: 1 x 7
#> a_2 b_2 c_2 d_2 e_2 f_2 Index
#> <chr> <dbl> <dbl> <chr> <dbl> <chr> <chr>
#> 1 Yes 155 155 No 0 Yes 2