Home > Software engineering >  Summarize one variable/column over all possible values of other variables/columns
Summarize one variable/column over all possible values of other variables/columns

Time:01-28

I need to summarize one variable/column of a long table after aggregating (group_by()) by another variable/column, I need to have the summarized value by all values of other variables/columns.

Here is test data:

library(tidyverse)
set.seed(123)

Site <- str_c("S", 1:5)
Species <- str_c("Sps", 1:6)

print(Species_tbl <- bind_cols(Species = Species,
                        Exotic = rbinom(length(Species), 1, .3),
                        Migrant = rbinom(length(Species), 2, .3)))


Data_tbl <- expand.grid(Site = Site,
                        Species = Species) %>%
            left_join(Species_tbl)


Data_tbl$Presence <- rbinom(nrow(Data_tbl), 1, .5)

And here is my best effort:

print(Data_tbl %>%
  group_by(Site) %>%
  summarise(N_sp = sum(Presence),
            N_sp_Exo = sum(Presence[Exotic == 1]),
            N_sp_Nat = sum(Presence[Exotic == 0]),
            N_sp_M0 = sum(Presence[Migrant == 0]),
            N_sp_M1 = sum(Presence[Migrant == 1]),
            N_sp_M2 = sum(Presence[Migrant == 2])))

CodePudding user response:

You can get the data in long format for your columns of interest c(Exotic, Migrant) and take sum of Presence columns for each unique column names and it's values. This can be merged with sum of each Site.

library(dplyr)
library(tidyr)

data1 <- Data_tbl %>%
  group_by(Site) %>%
  summarise(N_sp = sum(Presence))

data2 <- Data_tbl %>%
  pivot_longer(cols = c(Exotic, Migrant)) %>%
  group_by(Site, name, value) %>%
  summarise(result = sum(Presence), .groups = "drop") %>%
  pivot_wider(names_from = c(name, value), values_from = result)

inner_join(data1, data2, by = 'Site')

#  Site   N_sp Exotic_0 Exotic_1 Migrant_0 Migrant_1 Migrant_2
#  <fct> <int>    <int>    <int>     <int>     <int>     <int>
#1 S1        4        2        2         1         2         1
#2 S2        3        2        1         0         2         1
#3 S3        2        1        1         0         2         0
#4 S4        4        2        2         1         3         0
#5 S5        4        1        3         1         2         1

The answer has been divided in two steps for ease of readability. If you would like to do this in a single chain without creating temporary variables that can be done as well.

  • Related