Home > OS >  Subsetting tibbles in a list based on values
Subsetting tibbles in a list based on values

Time:07-27

I have a list of tibbles that looks like this:

$WT_top_markers
# A tibble: 128 × 2
# Groups:   cluster [26]
   cluster gene   
   <fct>   <chr>  
 1 0       Abi3bp 
 2 0       Apoe   
 3 0       Apoc1  
 4 0       Tgm2   
 5 0       Bcam   
 6 1       Aqp3   
 7 1       Sult1d1
 8 1       Dapl1  
 9 1       Fxyd3  
10 1       Pir    
# … with 118 more rows

$F7KO_top_markers
# A tibble: 125 × 2
# Groups:   cluster [25]
   cluster gene   
   <fct>   <chr>  
 1 0       Abi3bp 
 2 0       Apoe   
 3 0       Apoc1  
 4 0       Dapl1  
 5 0       Tgm2   
 6 1       Scgb3a1
 7 1       Sftpa1 
 8 1       Reg3g  
 9 1       Bpifb1 
10 1       Itln1  
# … with 115 more rows

$F8HET_top_markers
# A tibble: 147 × 2
# Groups:   cluster [30]
   cluster gene         
   <fct>   <chr>        
 1 0       Abi3bp       
 2 0       Apoe         
 3 0       Apoc1        
 4 0       1600014C10Rik
 5 0       Bcam         
 6 1       Krt14        
 7 1       Krt17        
 8 1       Krt5         
 9 1       Bcam         
10 1       Cav1         
# … with 137 more rows

I want to pull out the genes from the first tibble where cluster = 20. I have tried:

features_to_plot <- unlist(top_markers[[1]][[which(top_markers[[1]]$cluster == 20)]])

but am getting an error: ! Must extract column with a single valid subscript. ✖ Subscript which(top_markers[[1]]$cluster == 20) has size 5 but must be size 1.

Can anyone tell me how to do this properly?

Thanks, Stacy

CodePudding user response:

We can use lapply to loop over the list and subset where the 'cluster' value is 20

lapply(top_markers, \(x) subset(x, cluster == 20))

The error in the OP's code is related to usage of [[ for subsetting more than one element. Use [ with , i.e. top_markers[[1]] is the first list element which is a tibble, we get the row index with which(top_markers[[1]]$cluster == 20), if we want to subset the rows, the indexing will be rowindex, columnindex, and here we need to use rowindex,. By default, indexing in data.frame, tibble are taken as column index (eg. - tibble(col1 = 1:5)[1:2,] and not tibble(col1 = 1:5)[1:2] - returns error as there is only a single column and we request to select 2 columns)

top_markers[[1]][which(top_markers[[1]]$cluster == 20),]
  • Related