Home > Blockchain >  Extracting rows and columns of a matrix if row names and column names have a partial match
Extracting rows and columns of a matrix if row names and column names have a partial match

Time:05-26

I will give an example of my problem using a smaller matrix. Say I have a matrix with row names and column names such as this:

set.seed(10)

a <- matrix(rexp(200), ncol=9,nrow = 3)
colnames(a) <- paste(rep(c("aaa" , "bbb" , "ccc") , each = 3) , rep(c(1:3) , times = 3) , sep = "")
rownames(a) <- c("aaa" , "bbb" , "ccc")

giving matrix a:

          aaa1      aaa2      aaa3      bbb1      bbb2       bbb3      ccc1      ccc2      ccc3
aaa 0.01495641 1.5750419 2.3276229 0.6722683 1.3165471 1.63298388 1.7447187 0.3469224 1.3981074
bbb 0.92022120 0.2316586 0.7291238 0.4265298 0.4132938 0.07119408 0.2929501 0.7950826 1.1104594
ccc 0.75215894 1.0866730 1.2883101 1.1154219 0.6765753 2.56885161 0.6453052 1.3962992 0.1704216

I would like to find an efficient code that matches the row names with each column name without the digit, returning a vector. In this case:

      aaa1       aaa2       aaa3       bbb1       bbb2       bbb3       ccc1       ccc2       ccc3 
0.01495641 1.57504185 2.32762287 0.42652979 0.41329383 0.07119408 0.64530516 1.39629918 0.17042160 

I obtained the previous matrix using this code:

b <- c(a[grepl("aaa" , rownames(a)) , grepl("aaa" , colnames(a))] ,
       a[grepl("bbb" , rownames(a)) , grepl("bbb" , colnames(a))] ,
       a[grepl("ccc" , rownames(a)) , grepl("ccc" , colnames(a))] )

Is there a way to do this efficiently, even if the matrix is much larger and possibly has a different name structure than this?

CodePudding user response:

An easier option is to reshape to 'long' by converting to data.frame from table, and then subset the rows based on the values of 'Var1' and 'Var2'

out <- subset(as.data.frame.table(a), Var1 == sub("\\d ", "", Var2),
     select =c(Var2, Freq))
with(out, setNames(Freq, Var2))
    aaa1       aaa2       aaa3       bbb1       bbb2       bbb3       ccc1       ccc2       ccc3 
0.01495641 1.57504185 2.32762287 0.42652979 0.41329383 0.07119408 0.64530516 1.39629918 0.17042160 

Or with row/column indexing

i1 <- match( sub("\\d ", "", colnames(a)), rownames(a))
a[cbind(i1, seq_along(i1))]
[1] 0.01495641 1.57504185 2.32762287 0.42652979 0.41329383 0.07119408 0.64530516 1.39629918 0.17042160
  • Related