Home > other >  Splitting a matrix by both rows and columns using asplit() in R
Splitting a matrix by both rows and columns using asplit() in R

Time:02-01

Consider the following fake example where I extract all comparisons corresponding to a name called A from a matrix called matr.

### Set up example matrix ###
matr <- matrix(c(2,0,3,0,5,0.7,1,0,0.9,6,11,9,0,1,0.5,2,0,1,0.3,3,6,1,0.31,0,0),     nrow = 5, ncol = 5)
dimnames(matr) = list(c("A", "B", "A", "C", "A"),  c("A", "B", "A", "C", "A"))
matr

# Pretend the matrix is symmetric - for my real matrix, it is
matr[upper.tri(matr, diag = TRUE)] <- NA # gwt lower triangle
matr

for (rowLoopCounter in 1:nrow(matr)){

  #Get the row of interest
  matr_work <- matr[rowLoopCounter,,drop=FALSE]

  for (colLoopCounter in 1:nrow(matr)) {
    if (row.names(matr)[rowLoopCounter] == colnames(matr)[colLoopCounter]){
      matr[rowLoopCounter, colLoopCounter] <- NA
    }
  }
}

A_row <- c(matr[grepl("A", row.names(matr)), ]) # get comparisions in row
sA_col <- c(matr[, grepl("A", colnames(matr))]) # get comparisions in columns
total <- as.numeric(na.omit(unlist(c(_A_row, A_col)))) # combine results

total
#[1] 0 6 3 0 0 1

The above implementation is quite verbose, but only gets the job done for A. I need to also do this for B and C.

This can be done using a for loop (or apply()).

I naively tried using split(), which only works on vectors and gives strange results (leaves out the values 1 in A and puts it in C for some reason):

splt <- split(matr, colnames(matr)) # using rownames(matr) is equivalent

#$A
#[1] NA NA NA NA  0  6 NA NA NA NA NA  3 NA NA NA

#$B
#[1]  0 NA NA NA NA

#$C
#[1] 0.0 0.9 1.0  NA  NA

$A$ should contain the same elements as total.

I recently discovered the new asplit() function, but I get an error

asplit(matr, c(1, 2))
#Error in array(newx[, i], d.call, dn.call) : 'dims' cannot be of length 0

What I would like from asplit() is a similar output returned by split() where values are stored in named lists. However, from running the examples in the documentation for asplit(), there's no way to do this.

CodePudding user response:

You can use split() on both the column and row by swapping the rownames of which(!is.na(matr), arr.ind=T). Then use mapply() to combine your two lists.

#Get index of matr by its array index, removing NA values
ind<- which(!is.na(matr), arr.ind=T)

#Create a list by factor of row names.
list_1<- split(x = matr[ind], f = row.names(ind))

#Then substitute the column name as the row name. 
row.names(ind)<- colnames(matr)[unname(ind[,2])]

#Create a second list by factor of column name.
list_2<- split(x = matr[ind], f = row.names(ind))
    
#Combine your lists
mapply(c, list_1, list_2)

Output of the mapply():

$A
[1] 0 6 3 0 0 1

$B
[1] 0.0 0.0 0.9 6.0

$C
[1] 0.0 0.9 1.0 3.0

CodePudding user response:

not entirely sure what you want to achieve - but if it is a list of vectors, per letter, containing all matrix values where row and column letter coincide, you can do this:

library(dplyr) ## for convenient dataframe manipulation
df <- 
  cbind(
  expand.grid(row = dimnames(matr)[[1]],
              col = dimnames(matr)[[2]]),
  value = as.vector(matr)
)
#  > head(df)
#    row col value
# 1   A   A   2.0
# 2   B   A   0.0
# 3   A   A   3.0
# 4   C   A   0.0
# 5   A   A   5.0
# 6   A   B   0.7

filter above df for coinciding row and column letters, and summarise per letter:

df <- df |>
  filter(row == col) |>
  group_by(row) |>
  summarise(total = list(value))

convert to named list:

totals = setNames(df$total, df$row)

output:

## > totals
## $A
## [1]  2.00  3.00  5.00 11.00  0.00  0.50  6.00  0.31  0.00
## 
## $B
## [1] 1
## 
## $C
## [1] 0.3

CodePudding user response:

one liner would be:

with(na.omit(as.data.frame.table(matr)), split(c(Freq, Freq), c(Var1, Var2)))

$A
[1] 0 6 3 0 0 1

$B
[1] 0.0 0.0 0.9 6.0

$C
[1] 0.0 0.9 1.0 3.0
  • Related