How to group and sum using a loop in R? [duplicate]-CodePudding

I have a df in R similar to this one:

taxa <- c("bac", "bac", "bac", "bac", "bac", "bac", "arch", "arch", "arch")
ON1 <- c(2, 45, 34, 90, 0, 39, 12, 11, 5)
ON2 <- c(22, 67, 87, 90, 0, 0, 77, 21, 20)
ON3 <- c(46, 55, 1, 3, 0, 100, 88, 66, 9)
df <- data.frame(taxa, ON1, ON2, ON3, ON4)

I would like to group by "taxa" and then sum the numbers.

Option 1:

    s <- split(df, df$taxa)
    ON1 <- as.data.frame(lapply(s, function(x) {
    sum(x[, c("ON1")])
    }))

Option 2:

    ON1 <- tapply(df$ON1, df$taxa, FUN=sum)
    ON1 <- as.data.frame(ON1)

Result: Bac (210) and Arch (28)

Both Option 1 and 2 do what I want but I want to create a loop so that I can do this simultaneously for ON2 and ON3 etc. (I have many more columns)

Thanks!

CodePudding user response：

We can use aggregate

> aggregate(. ~ taxa, df, sum)
  taxa ON1 ON2 ON3
1 arch  28 118 163
2  bac 210 266 205

CodePudding user response：

Instead of a loop, it's easier to use tidyverse functions. To do this, you "group" by your variable and summarize with the summary function being sum.

library(tidyverse)
df %>%
    group_by(taxa) %>%
    summarize(across(ON1:ON3, sum))
#> # A tibble: 2 × 4
#>   taxa    ON1   ON2   ON3
#>   <chr> <dbl> <dbl> <dbl>
#> 1 arch     28   118   163
#> 2 bac     210   266   205
Created on 2021-09-29 by the reprex package (v2.0.1)

CodePudding user response：

Use groupby and summarize_each:

df %>% group_by(taxa) %>% summarize_each(sum)

Output:

taxa    ON1     ON2     ON3
<fct>   <dbl>   <dbl>   <dbl>
arch    28      118     163
bac     210     266     205

CodePudding user response：

data.table

library(data.table)
setDT(df)[, lapply(.SD, sum), by = taxa]

   taxa ON1 ON2 ON3
1:  bac 210 266 205
2: arch  28 118 163