Home > other >  How to sum values of cells by their column and row names to create a matrix in R?
How to sum values of cells by their column and row names to create a matrix in R?

Time:12-25

I have a data matrix where row and column names are similar. However, row names are repetitive. I want to sum cell values by a unique combination of row names and column names (for example, sum of all cell values having dimension of Row1 * Col1) and create a matrix in R. The new matrix will be sum of all cells by unique combination or Row and Colum names. Thank you

Example Dataset: enter image description here

CodePudding user response:

You can do as such. See the comments trough the code as well. Also you need the pckgs that G. Grothendieck gave in his/her answer.

    df <- data.frame(Row = c("r_1", "r_2",etc.),
                     Col = c("Col1", "Col2",etc.),
                     Value = c(1, 2, etc.))
    
    row_names <- unique(df$Row)
    col_names <- unique(df$Col)
#this will give you all possible combinations
    combinations <- expand.grid(Row = row_names, Col = col_names)
    result <- df %>% group_by(Row, Col) %>% summarize(Value = sum(Value))
    result <- left_join(combinations, result, by = c("Row" = "Row", "Col" = "Col"))
    result[is.na(result)] <- 0
    names(result)[1] <- "R"# choose the value that suits you best
    names(result)[2] <- "C"# choose the value that suits you best
   

CodePudding user response:

Here is a full tidyverse solution in the single pipe:

library(dplyr)
library(tidyr)
rn <- c("name1", "name2", "name3", 
"name1", "name2", "name3") # necessary as duplicate names are not allowed in data.frame 
# and will be dropped at type cast
matrix(1:18, ncol=3) %>%
`colnames<-`(c("name1", "name2", "name3")) %>% 
`rownames<-`(rn) %>% # here is the sample matrix 
# alike one from your question ready for analysis 
data.frame() %>%
mutate(rn = rn, .before=1) %>%
# restore the row names
# you can store them from your matrix in a variable before type cast 
pivot_longer(-rn) %>%
group_by(rn, name) %>%
summarise(value=sum(value)) %>%
pivot_wider()

It will result in:

# A tibble: 3 x 4
# Groups:   rn [3]
  rn    name1 name2 name3
  <chr> <int> <int> <int>
1 name1     5    17    29
2 name2     7    19    31
3 name3     9    21    33
  • Related