Let's say i have a simple 6x6 matrix like this one:
x <- matrix(1:36, nrow = 6, dimnames = list(c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3"), c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3")))
AUS1 AUS2 AUS3 AUT1 AUT2 AUT3
AUS1 1 7 13 19 25 31
AUS2 2 8 14 20 26 32
AUS3 3 9 15 21 27 33
AUT1 4 10 16 22 28 34
AUT2 5 11 17 23 29 35
AUT3 6 12 18 24 30 36
The letters stand for a country (AUS as Australia) and the number behind for a sector. Now, i want to sum every column but on the condition that only values are taken from rows that do not come from the same country. For example, the sum of the first column (AUS1) should only contain values of the rows AUT1, AUT2 and AUT3. Same goes for the columns of AUS2 and AUS3. The sum of the column AUT1 should then only include values from the rows AUS1, AUS2, and AUS3.
Since my tables are much larger than this, i cannot simply select the individual rows.
I was thinking of a function that matches part of the column names with parts of the row names. If they contain the same three letters, the value is not included in the sum.
CodePudding user response:
Here is a base R way. It runs three loops but since the first two (lapply
loops) are on row and column names, those two shouldn't take much processing time.
Then the real work is done in a Map
loop, calling rowSums
on the subsets of row and column names determined before.
x <- matrix(1:36, nrow = 6, dimnames = list(c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3"), c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3")))
rn <- unique(gsub("\\d", "", rownames(x)))
rows <- lapply(rn, grep, rownames(x))
cols <- lapply(rn, grep, colnames(x), invert = TRUE)
Map(\(r, c) rowSums(x[r, c]), rows, cols)
#> [[1]]
#> AUS1 AUS2 AUS3
#> 75 78 81
#>
#> [[2]]
#> AUT1 AUT2 AUT3
#> 30 33 36
Created on 2022-05-18 by the reprex package (v2.0.1)
CodePudding user response:
using data.table
library(data.table)
dt <- data.table(as.data.frame.table(x))
dt[which(substr(Var1, 1, 3) != substr(Var2, 1, 3)), .(sum = sum(Freq)), by = Var2]
*use by = Var2 for "rowSums", but you can use by = Var1 if you need the "colSums"
output
# Var2 sum
# 1: AUS1 15
# 2: AUS2 33
# 3: AUS3 51
# 4: AUT1 60
# 5: AUT2 78
# 6: AUT3 96
data as provided by OP
x <- matrix(1:36, nrow = 6, dimnames = list(c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3"), c("AUS1","AUS2","AUS3", "AUT1", "AUT2", "AUT3")))
CodePudding user response:
Here is another base R way :
matched_sum <- function(dfr){
matched_col <- function(col_id) {
col_pattern <- gsub("[0-9]", "", colnames(dfr[col_id]))
dfr[grepl(col_pattern, rownames(x)),col_id] <- NA
return(dfr[col_id])
}
new_col <- lapply(1:ncol(dfr), matched_col)
new_dfr <- do.call(cbind.data.frame, new_col)
colSums(new_dfr, na.rm = TRUE)
}
# Your data frame. You can use as.data.frame(x) in case x is not a data frame
x
AUS1 AUS2 AUS3 AUT1 AUT2 AUT3
AUS1 1 7 13 19 25 31
AUS2 2 8 14 20 26 32
AUS3 3 9 15 21 27 33
AUT1 4 10 16 22 28 34
AUT2 5 11 17 23 29 35
AUT3 6 12 18 24 30 36
# Apply the function to x
matched_sum(x)
AUS1 AUS2 AUS3 AUT1 AUT2 AUT3
15 33 51 60 78 96
What the function does
col_pattern <- gsub("[0-9]", "", colnames(dfr[col_id]))
finds a pattern in each column name. The pattern is any string other than numbers. For example : the pattern in "AUS1" is "AUS".dfr[grepl(col_pattern, rownames(x)),col_id] <- NA
assigns NA to any row in the column that has pattern found in the 1st step. For example, the first column after this step will become:
AUS1
AUS1 NA
AUS2 NA
AUS3 NA
AUT1 4
AUT2 5
AUT3 6
lapply(1:ncol(dfr), matched_col)
apply the 1st step and the 2nd step to each column in the data frame.do.call(cbind.data.frame, new_col)
binds all columns (that already has NA in the selected rows) to a data frame. For example, if the input isx
that you provides, after this step it will become:
AUS1 AUS2 AUS3 AUT1 AUT2 AUT3
AUS1 NA NA NA 19 25 31
AUS2 NA NA NA 20 26 32
AUS3 NA NA NA 21 27 33
AUT1 4 10 16 NA NA NA
AUT2 5 11 17 NA NA NA
AUT3 6 12 18 NA NA NA
colSums(new_dfr, na.rm = TRUE)
sums all non-NA values in each column in the data frame created in the 4th step.
In case you want to keep the matrix structure for you data, you can use this:
matched_sum_mat <- function(mat){
matched_col <- function(col_id) {
col_pattern <- gsub("[0-9]", "", dimnames(mat)[[2]][col_id])
mat[grepl(col_pattern, dimnames(mat)[[1]]),col_id] <- NA
return(mat[,col_id])
}
new_col <- lapply(1:ncol(mat), matched_col)
new_mat <- do.call(cbind, new_col)
colnames(new_mat) <- colnames(mat)
colSums(new_mat, na.rm = TRUE)
}
# Apply to x as a matrix
matched_sum_mat(x)
AUS1 AUS2 AUS3 AUT1 AUT2 AUT3
15 33 51 60 78 96