Subsetting tibbles in a list based on values-CodePudding

I have a list of tibbles that looks like this:

$WT_top_markers
# A tibble: 128 × 2
# Groups:   cluster [26]
   cluster gene   
   <fct>   <chr>  
 1 0       Abi3bp 
 2 0       Apoe   
 3 0       Apoc1  
 4 0       Tgm2   
 5 0       Bcam   
 6 1       Aqp3   
 7 1       Sult1d1
 8 1       Dapl1  
 9 1       Fxyd3  
10 1       Pir    
# … with 118 more rows

$F7KO_top_markers
# A tibble: 125 × 2
# Groups:   cluster [25]
   cluster gene   
   <fct>   <chr>  
 1 0       Abi3bp 
 2 0       Apoe   
 3 0       Apoc1  
 4 0       Dapl1  
 5 0       Tgm2   
 6 1       Scgb3a1
 7 1       Sftpa1 
 8 1       Reg3g  
 9 1       Bpifb1 
10 1       Itln1  
# … with 115 more rows

$F8HET_top_markers
# A tibble: 147 × 2
# Groups:   cluster [30]
   cluster gene         
   <fct>   <chr>        
 1 0       Abi3bp       
 2 0       Apoe         
 3 0       Apoc1        
 4 0       1600014C10Rik
 5 0       Bcam         
 6 1       Krt14        
 7 1       Krt17        
 8 1       Krt5         
 9 1       Bcam         
10 1       Cav1         
# … with 137 more rows

I want to pull out the genes from the first tibble where cluster = 20. I have tried:

features_to_plot <- unlist(top_markers[[1]][[which(top_markers[[1]]$cluster == 20)]])

but am getting an error: ! Must extract column with a single valid subscript. ✖ Subscript which(top_markers[[1]]$cluster == 20) has size 5 but must be size 1.

Can anyone tell me how to do this properly?

Thanks, Stacy

CodePudding user response：

We can use lapply to loop over the list and subset where the 'cluster' value is 20

lapply(top_markers, \(x) subset(x, cluster == 20))

The error in the OP's code is related to usage of [[ for subsetting more than one element. Use [ with , i.e. top_markers[[1]] is the first list element which is a tibble, we get the row index with which(top_markers[[1]]$cluster == 20), if we want to subset the rows, the indexing will be rowindex, columnindex, and here we need to use rowindex,. By default, indexing in data.frame, tibble are taken as column index (eg. - tibble(col1 = 1:5)[1:2,] and not tibble(col1 = 1:5)[1:2] - returns error as there is only a single column and we request to select 2 columns)

top_markers[[1]][which(top_markers[[1]]$cluster == 20),]