Home > Software engineering >  Filter a matrix by rownames (conditionally) in R
Filter a matrix by rownames (conditionally) in R

Time:03-04

Suppose you have this matrix:

> dput(b)
structure(c(8.428852043462e-16, 0.98006786315672, 0.0636247563553075, 
-0.246858810409958, -1.37811970502942, -0.281625554642936, -8.91350446654785e-16, 
-0.305283565399869, -1.00802628192793, 0.14027577547337, -1.66288850621351, 
0.16259170026583, -1.3280185195633e-15, 0.278629912397198, -0.188868484543887, 
1.0533053295465, 1.16670767240438, -0.48819960367166), .Dim = c(6L, 
3L), .Dimnames = list(c("(Intercept)", "F_slowPC1", "F_slowPC2", 
"F_slowPC3", "data_yFYFF", "data_yPUNEW"), c("PC1", "PC2", "PC3"
)))

I want to just get the rows that start with "data_y" string.

I was trying to filter it up with a logical condition where:

stingr::str_detect(rownames(b), "data_y")
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE

So I tried

b[rownames(b) %in% str_detect(rownames(b),"data_y")==T]

but it just gets me Numeric(0).

How can I get all the rows that contain "data_y"?

I would prefer to not transform this matrix to data frame.

CodePudding user response:

Use b[i, ] to subset rows. b[i] subsets the elements of as.vector(b) indexed by i, which is not what you want.

You don't need stringr to construct i, because base R has startsWith and grep. Either of these statements would work:

b[startsWith(rownames(b), "data_y"), , drop = FALSE]
b[grep("^data_y", rownames(b)), , drop = FALSE]

drop = FALSE guarantees a matrix result. By default, the result is a dimensionless vector if only one row is indexed. You can compare b[1, ] and b[1, , drop = FALSE] to see what I mean.

CodePudding user response:

You just need

b[stringr::str_detect(rownames(b), "data_y"), ]

Add a , after your function to specify you select on the rows.

CodePudding user response:

A possible solution, based on tidyverse with previous conversion of the data to dataframe:

library(tidyverse)

df %>% as.data.frame %>% 
  rownames_to_column("coefficients") %>% 
  filter(str_detect(coefficients, "^data_y"))

#>   coefficients        PC1        PC2        PC3
#> 1   data_yFYFF -1.3781197 -1.6628885  1.1667077
#> 2  data_yPUNEW -0.2816256  0.1625917 -0.4881996

CodePudding user response:

Another option with data.table:

library(data.table)
bt <- as.data.table(b, keep.rownames = TRUE)

bt[like(rn,"data_y")]

#            rn        PC1        PC2        PC3
#1:  data_yFYFF -1.3781197 -1.6628885  1.1667077
#2: data_yPUNEW -0.2816256  0.1625917 -0.4881996

*Note: This could be done in one line, but opted to create a data table rather than using setDT in case you don't want to alter the original dataframe.

  • Related