Home > Net >  Populate Binary Matrix with Double For Loop in R
Populate Binary Matrix with Double For Loop in R

Time:09-02

I'm working on populating a binary matrix based on values from a different table. I can create the matrix but am struggling with the looping needed to populate it. I think this is a pretty simple issue so I hope I can get some easy help.

Here's an example of my data:

start <- c(291, 291, 291, 702, 630, 768)
sequence <- c("chr9:103869456:103870456", "chr5:30823103:30824103", "chr11:49801703:49802703", "chr4:133865601:133866601", "chr12:55738034:55739034", "chr8:96569493:96570493")
motif <- c("ARI5B", "ARI5B", "ARI5B", "ATOH1", "EGR1", "EGR1")

df <- data.frame(start, sequence, motif)

I have created a character vector for each unique motif start values like so:

x <- sprintf("%s_%d", df$motif, df$start)
x <- unique(x)

Next I create a binary matrix with the sequences as rows and the values from x as columns:

binmat <- matrix(0, nrow = length(df$sequence), ncol = length(x))
rownames(binmat) <- df$sequence
colnames(binmat) <- x

And now I'm stuck. I want to iterate through columns and rows and put a 1 in each position that has a match. For example, the first sequence is "chr9:103869456:103870456" and it has motif "ARI5B" at starting position 291, so it should get a 1 while the rest of the values in that row remain at 0. The output of this example should look like this:


                         ARI5B_291 ATOH1_702 EGR1_630 EGR1_768
chr9:103869456:103870456         1         0        0        0
chr5:30823103:30824103           1         0        0        0
chr11:49801703:49802703          1         0        0        0
chr4:133865601:133866601         0         1        0        0
chr12:55738034:55739034          0         0        1        0
chr8:96569493:96570493           0         0        0        1

But so far I am unsuccessful. I think I need a double for loop somewhere along these lines:

for (row in binmat){
  for (col in binmat){
     if (row && col %in% x){
         1
     } else { 0
     }
   }
}

But all I get are 0s.

Thanks in advance!

CodePudding user response:

Aren't you just looking for table here? You can get the result as a vectorized one-liner, without loops, by doing:

table(factor(df$sequence, df$sequence), sprintf("%s_%d", df$motif, df$start))
                          
                           ARI5B_291 ATOH1_702 EGR1_630 EGR1_768
  chr9:103869456:103870456         1         0        0        0
  chr5:30823103:30824103           1         0        0        0
  chr11:49801703:49802703          1         0        0        0
  chr4:133865601:133866601         0         1        0        0
  chr12:55738034:55739034          0         0        1        0
  chr8:96569493:96570493           0         0        0        1
  • Related