Matrix: Summing columns and rows conditional on variable name-CodePudding

Let's say i have a simple 6x6 matrix like this one:

x <- matrix(1:36, nrow = 6, dimnames = list(c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3"), c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3")))

     AUS1 AUS2 AUS3 AUT1 AUT2 AUT3
AUS1    1    7   13   19   25   31
AUS2    2    8   14   20   26   32
AUS3    3    9   15   21   27   33
AUT1    4   10   16   22   28   34
AUT2    5   11   17   23   29   35
AUT3    6   12   18   24   30   36

The letters stand for a country (AUS as Australia) and the number behind for a sector. Now, i want to sum every column but on the condition that only values are taken from rows that do not come from the same country. For example, the sum of the first column (AUS1) should only contain values of the rows AUT1, AUT2 and AUT3. Same goes for the columns of AUS2 and AUS3. The sum of the column AUT1 should then only include values from the rows AUS1, AUS2, and AUS3.

Since my tables are much larger than this, i cannot simply select the individual rows.

I was thinking of a function that matches part of the column names with parts of the row names. If they contain the same three letters, the value is not included in the sum.

CodePudding user response：

Here is a base R way. It runs three loops but since the first two (lapply loops) are on row and column names, those two shouldn't take much processing time.
Then the real work is done in a Map loop, calling rowSums on the subsets of row and column names determined before.

x <- matrix(1:36, nrow = 6, dimnames = list(c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3"), c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3")))

rn <- unique(gsub("\\d", "", rownames(x)))
rows <- lapply(rn, grep, rownames(x))
cols <- lapply(rn, grep, colnames(x), invert = TRUE)

Map(\(r, c) rowSums(x[r, c]), rows, cols)
#> [[1]]
#> AUS1 AUS2 AUS3 
#>   75   78   81 
#> 
#> [[2]]
#> AUT1 AUT2 AUT3 
#>   30   33   36

^{Created on 2022-05-18 by the reprex package (v2.0.1)}

CodePudding user response：

using data.table

library(data.table)

dt <- data.table(as.data.frame.table(x))

dt[which(substr(Var1, 1, 3) != substr(Var2, 1, 3)), .(sum = sum(Freq)), by = Var2]

*use by = Var2 for "rowSums", but you can use by = Var1 if you need the "colSums"

output

#    Var2 sum
# 1: AUS1  15
# 2: AUS2  33
# 3: AUS3  51
# 4: AUT1  60
# 5: AUT2  78
# 6: AUT3  96

data as provided by OP

x <- matrix(1:36, nrow = 6, dimnames = list(c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3"), c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3")))

CodePudding user response：

Here is another base R way :

matched_sum <- function(dfr){
    matched_col <- function(col_id) {
        col_pattern <- gsub("[0-9]", "", colnames(dfr[col_id]))
        dfr[grepl(col_pattern, rownames(x)),col_id] <- NA
        return(dfr[col_id])
    }
    new_col <- lapply(1:ncol(dfr), matched_col)
    new_dfr <- do.call(cbind.data.frame, new_col)
    colSums(new_dfr, na.rm = TRUE)
}

# Your data frame. You can use as.data.frame(x) in case x is not a data frame 
x
     AUS1 AUS2 AUS3 AUT1 AUT2 AUT3
AUS1    1    7   13   19   25   31
AUS2    2    8   14   20   26   32
AUS3    3    9   15   21   27   33
AUT1    4   10   16   22   28   34
AUT2    5   11   17   23   29   35
AUT3    6   12   18   24   30   36

# Apply the function to x
matched_sum(x)

AUS1 AUS2 AUS3 AUT1 AUT2 AUT3 
  15   33   51   60   78   96

What the function does

col_pattern <- gsub("[0-9]", "", colnames(dfr[col_id])) finds a pattern in each column name. The pattern is any string other than numbers. For example : the pattern in "AUS1" is "AUS".
dfr[grepl(col_pattern, rownames(x)),col_id] <- NA assigns NA to any row in the column that has pattern found in the 1st step. For example, the first column after this step will become:

    AUS1
AUS1   NA
AUS2   NA
AUS3   NA
AUT1    4
AUT2    5
AUT3    6

lapply(1:ncol(dfr), matched_col) apply the 1st step and the 2nd step to each column in the data frame.
do.call(cbind.data.frame, new_col) binds all columns (that already has NA in the selected rows) to a data frame. For example, if the input is x that you provides, after this step it will become:

     AUS1 AUS2 AUS3 AUT1 AUT2 AUT3
AUS1   NA   NA   NA   19   25   31
AUS2   NA   NA   NA   20   26   32
AUS3   NA   NA   NA   21   27   33
AUT1    4   10   16   NA   NA   NA
AUT2    5   11   17   NA   NA   NA
AUT3    6   12   18   NA   NA   NA

colSums(new_dfr, na.rm = TRUE) sums all non-NA values in each column in the data frame created in the 4th step.

In case you want to keep the matrix structure for you data, you can use this:

matched_sum_mat <- function(mat){
    matched_col <- function(col_id) {
        col_pattern <- gsub("[0-9]", "", dimnames(mat)[[2]][col_id])
        mat[grepl(col_pattern, dimnames(mat)[[1]]),col_id] <- NA
        return(mat[,col_id])
    }
    new_col <- lapply(1:ncol(mat), matched_col)
    new_mat <- do.call(cbind, new_col)
    colnames(new_mat) <- colnames(mat)
    colSums(new_mat, na.rm = TRUE)
}

# Apply to x as a matrix

matched_sum_mat(x)

AUS1 AUS2 AUS3 AUT1 AUT2 AUT3 
  15   33   51   60   78   96