Home > other >  averaging values of rare species in ecological surveys
averaging values of rare species in ecological surveys

Time:06-06

I am doing an analysis of fish biomass from a visual survey, below is a mock data I made up.

date site species n mass
5/10 x snapper-x 5 500
6/10 x snapper-x 4 400
6/10 x snapper-y 1 200
6/10 x snapper-z 2 300
7/10 x snapper-x 3 300
7/10 x snapper-z 5 750

I'm trying to get the average count and biomass of each species per site using dplyr, but I'm running into trouble since rarer species do not show up in all survey attempts. The program automatically averages the biomass per the amount of time that species shows up, e.g., snapper-x: sum/3, while snapper-z: sum/2, even though I want them all to be divided by the number of surveys I have done, which is 3 (5/10. 6/10, and 7/10).

avg_biomass <- raw_biomass %>%
  filter(grepl('snapper', species)) %>%
  group_by(date, site, species) %>%
  summarize(n_avg=mean(n), mass_avg=mean(mass))

I have also tried summarize(mass_avg=sum(mass)/n_distinct(date) but it didn't work since the code already grouped everything by group_by() above that line.

Alternatively, I could try to add new rows of the rarer species with n and mass of 0s, but I'm not sure what function I should be using to achieve that.

CodePudding user response:

The main problem is missing rows: if a species just does not show up in a given site on a given date, there is no row for that. You want a row with n and mass = 0. Here is a way to accomplish that.

library(data.table)
setDT(df)
allSpecies <- unique(df$species)
kpi        <- c('n', 'mass')
result     <- df[, .(species=allSpecies), by=.(date, site)][, c(kpi):=0]
result[df, c(kpi):=.(i.n, i.mass), on=.(date, site, species)]
result[, lapply(.SD, mean), by=.(site, species), .SDcols=kpi]

CodePudding user response:

Two possible tidyverse approaches: The first divides the sum of n and mass by the total number of survey dates; the second inserts the missing dates with zeroes:

library(tidyverse)

raw_biomass <- tribble(
  ~date, ~site, ~species, ~n, ~mass,
  "5/10", "x", "snapper-x", 5, 500,
  "6/10", "x", "snapper-x", 4, 400,
  "6/10", "x", "snapper-y", 1, 200,
  "6/10", "x", "snapper-z", 2, 300,
  "7/10", "x", "snapper-x", 3, 300,
  "7/10", "x", "snapper-z", 5, 750
)

# Divide by total surveys
raw_biomass %>%
  filter(grepl("snapper", species)) %>%
  mutate(n_survey = n_distinct(date)) %>%
  group_by(site, species) %>%
  summarize(n = sum(n) / first(n_survey), mass = sum(mass) / first(n_survey))

#> # A tibble: 3 × 4
#> # Groups:   site [1]
#>   site  species       n  mass
#>   <chr> <chr>     <dbl> <dbl>
#> 1 x     snapper-x 4     400  
#> 2 x     snapper-y 0.333  66.7
#> 3 x     snapper-z 2.33  350
  
# Add missing dates
raw_biomass %>%
  filter(str_detect(species, "snapper")) %>% # Tidyverse alternative to grepl
  complete(date, nesting(site, species), fill = list(n = 0, mass = 0)) %>%
  group_by(site, species) %>%
  summarize(n = mean(n), mass = mean(mass))

#> # A tibble: 3 × 4
#> # Groups:   site [1]
#>   site  species       n  mass
#>   <chr> <chr>     <dbl> <dbl>
#> 1 x     snapper-x 4     400  
#> 2 x     snapper-y 0.333  66.7
#> 3 x     snapper-z 2.33  350

Created on 2022-06-05 by the reprex package (v2.0.1)

  • Related