Home > Mobile >  R - sum multiple separate columns for each unique ID? Aggregate?
R - sum multiple separate columns for each unique ID? Aggregate?

Time:01-22

My dataset features several blocks, each containing several plots. In each plot, three different lifeforms were marked as present/absent (i.e. 1/0):

Block Plot tree bush grass
1 1 0 1 0
1 2 1 1 1
1 3 1 1 1
2 1 0 0 1
2 2 0 0 1
2 3 1 0 1

I'm looking for a code that will sum the total number of counts for each distict lifeform at the block level.

I would like an output that resembles this:

Block tree bush grass
1 2 3 2
2 1 0 3

I have tried this many ways but the only thing that comes close is: aggregate(df[,3:5], by = list(df$block), FUN = sum)

However, what this actually returns is:

Block tree bush grass
1 7 7 7
2 4 4 4

It appears to be summing all columns together instead of keeping the lifeforms separate.

I feel as though this should be so simple, as there are many queries online about similar processes, but nothing I try has worked.

CodePudding user response:

library(tidyverse)

df %>%  
  select(-Plot) %>% 
  pivot_longer(-Block) %>%  
  group_by(Block, name) %>% 
  summarise(sum = sum(value)) %>% 
  pivot_wider(names_from = name, values_from = sum)

# A tibble: 2 × 4
# Groups:   Block [2]
  Block  bush grass  tree
  <dbl> <dbl> <dbl> <dbl>
1     1     3     2     2
2     2     0     3     1

CodePudding user response:

You were close. Maybe just a typo?

The data frame style

aggregate(df[,3:5], by = list(Block = df$Block), sum)
  Block tree bush grass
1     1    2    3     2
2     2    1    0     3

Or a formula style aggregate

aggregate(. ~ Block, df[,-2], sum)
  Block tree bush grass
1     1    2    3     2
2     2    1    0     3

With dplyr

library(dplyr)

df %>% 
  group_by(Block) %>% 
  summarize(across(tree:grass, sum))
# A tibble: 2 × 4
  Block  tree  bush grass
  <int> <int> <int> <int>
1     1     2     3     2
2     2     1     0     3

Data

df <- structure(list(Block = c(1L, 1L, 1L, 2L, 2L, 2L), Plot = c(1L, 
2L, 3L, 1L, 2L, 3L), tree = c(0L, 1L, 1L, 0L, 0L, 1L), bush = c(1L, 
1L, 1L, 0L, 0L, 0L), grass = c(0L, 1L, 1L, 1L, 1L, 1L)), class = 
"data.frame", row.names = c(NA, 
-6L))
  • Related