I have a df in R similar to this one:
taxa <- c("bac", "bac", "bac", "bac", "bac", "bac", "arch", "arch", "arch")
ON1 <- c(2, 45, 34, 90, 0, 39, 12, 11, 5)
ON2 <- c(22, 67, 87, 90, 0, 0, 77, 21, 20)
ON3 <- c(46, 55, 1, 3, 0, 100, 88, 66, 9)
df <- data.frame(taxa, ON1, ON2, ON3, ON4)
I would like to group by "taxa" and then sum the numbers.
- Option 1:
s <- split(df, df$taxa)
ON1 <- as.data.frame(lapply(s, function(x) {
sum(x[, c("ON1")])
}))
- Option 2:
ON1 <- tapply(df$ON1, df$taxa, FUN=sum)
ON1 <- as.data.frame(ON1)
Result: Bac (210) and Arch (28)
Both Option 1 and 2 do what I want but I want to create a loop so that I can do this simultaneously for ON2 and ON3 etc. (I have many more columns)
Thanks!
CodePudding user response:
We can use aggregate
> aggregate(. ~ taxa, df, sum)
taxa ON1 ON2 ON3
1 arch 28 118 163
2 bac 210 266 205
CodePudding user response:
Instead of a loop, it's easier to use tidyverse functions. To do this, you "group" by your variable and summarize
with the summary function being sum
.
library(tidyverse)
df %>%
group_by(taxa) %>%
summarize(across(ON1:ON3, sum))
#> # A tibble: 2 × 4
#> taxa ON1 ON2 ON3
#> <chr> <dbl> <dbl> <dbl>
#> 1 arch 28 118 163
#> 2 bac 210 266 205
Created on 2021-09-29 by the reprex package (v2.0.1)
CodePudding user response:
Use groupby
and summarize_each
:
df %>% group_by(taxa) %>% summarize_each(sum)
Output:
taxa ON1 ON2 ON3
<fct> <dbl> <dbl> <dbl>
arch 28 118 163
bac 210 266 205
CodePudding user response:
data.table
library(data.table)
setDT(df)[, lapply(.SD, sum), by = taxa]
taxa ON1 ON2 ON3
1: bac 210 266 205
2: arch 28 118 163