Home > front end >  R - How to group and sum rows with multiple columns?
R - How to group and sum rows with multiple columns?

Time:01-27

this seems like something that should be really easy to do but for some reason no method seems to be working for me. I have a dataframe which lists a bunch of sample IDs on the rows and a whole list of Fungal species on the columns. One column lists the regions that the samples are located in. I would like to group the rows into their regions and then sum their values for each column. Here is the code I have tried (and the errors they produce):

heatMapTable2 <- aggregate(x = heatMapTable[ , 2:ncol(heatMapTable)], by = heatMapTable[,1], FUN = sum)

Error in aggregate.data.frame(as.data.frame(x), ...) : 
  arguments must have same length

heatMapTable2 <- aggregate(x = heatMapTable[ , 2:ncol(heatMapTable)], by = heatMapTable$sampEcoReg, FUN = sum)

Error in aggregate.data.frame(as.data.frame(x), ...) : 
  'by' must be a list

library(dplyr)
heatMapTable[,2:ncol(heatMapTable)] %>%
group_by(heatMapTable$sampEcoReg) %>%
summarise_each(funs(sum))

Error in UseMethod("group_by") : 
  no applicable method for 'group_by' applied to an object of class "c('matrix', 'array', 'list')"

A csv of the dataframe I am trying to sort can be found here

Any help would be greatly appreciated. I've been struggling to figure this out for hours! Thank you!

For some reason it keeps thinking the heatMapTable isn't a dataframe and I have to coerce it back. The following gives a different error that might shed more light?

heatMapTable <- as.data.frame(heatMapTable)
library(dplyr)
heatMapTable %>% 
    group_by(sampEcoReg) %>% 
    summarize_all(sum) %>%
    as.data.frame()

Error: Problem with `summarise()` column `NR_157889_Cortinarius_cremeolina`.
ℹ `NR_157889_Cortinarius_cremeolina = .Primitive("sum")(NR_157889_Cortinarius_cremeolina)`.
x invalid 'type' (list) of argument
ℹ The error occurred in group 1: sampEcoReg = "Lakes".
Run `rlang::last_error()` to see where the error occurred.

CodePudding user response:

Using group_by() %>% summarize_all() from dplyr:

heatMapTable %>% 
    as.data.frame() %>%
    group_by(sampEcoReg) %>% 
    summarize_all(sum) %>%
    as.data.frame()

CodePudding user response:

We can use across to sum each grouped column.

library(tidyverse)
library(readxl)

#data obtained from https://otagouni-my.sharepoint.com/:x:/g/personal/lassa109_student_otago_ac_nz/EeGurryaRklEntap70ww8ggBe3yS07wZBqUCYtCCqml9XA?rtime=yqv_AETh2Ug
heatMapTable <- read_xlsx('heatMapTableEcoReg.xlsx')

heatMapTable %>% 
    group_by(sampEcoReg) %>% 
    summarise(across(where(is.numeric), sum))



# A tibble: 21 × 2,778
   sampEcoReg     NR_157889_Corti… NR_172327_Corti… ASV2557_Cortina…
   <chr>                     <dbl>            <dbl>            <dbl>
 1 Aspiring                      0                0               15
 2 Canterbury Fo…                0                0                0
 3 Catlins                       0                0                0
 4 Fiord                         0                0                0
 5 Hawdon                        0                0                0
 6 Heron                         0                0                0
 7 Lakes                         0                0                0
 8 Lammerlaw                     0                0                0
 9 MacKenzie                     0                0                0
10 Mavora                        0                0                0
# … with 11 more rows, and 2,774 more variables:
#   ASV40605_Cortinarius_sp. <dbl>, MK838250 <dbl>,
#   ASV44745_Cortinarius_comptulus <dbl>,
#   ASV14648_Cortinarius_comptulus <dbl>,
#   ASV15995_Cortinarius_sp. <dbl>, ASV26274_Cortinarius_sp. <dbl>,
#   MW341317_Cortinarius_vernus <dbl>,
#   KX355517_Cortinarius_vernus <dbl>, …

Is that the desired output?

  •  Tags:  
  • Related