Home > Software engineering >  Aggregate and summarise dataset information with R
Aggregate and summarise dataset information with R

Time:04-01

I have a breeding productivity dataset:

df1
    # Nest.box Obs.type individual.number Clutch
    # 1 Nest1 Egg 1 First
    # 2 Nest1 Egg 2 First
    # 3 Nest1 Egg 3 First
    # 4 Nest2 Egg 1 First
    # 5 Nest2 Egg 2 First
    # 6 Nest2 Egg 1 First
    # 7 Nest1 Chick 1 First
    # 8 Nest1 Chick 2 First
    # 9 Nest2 Chick 1 First
    # 10 Nest2 Chick 2 First
    # 11 Nest2 Chick 1 Second
    # 12 Nest2 Chick 2 Second

I want to summarise these data by aggregating by Nest.box and Clutch (shows the max number of eggs, the max number of chicks by nest.box, by clutch)

The wanted output would be something like this:

output
        # Nest.box Clutch Eggs Chicks
        # 1 Nest1 First 3 2
        # 2 Nest2 First 2 2
        # 3 Nest2 Second NA 2

CodePudding user response:

Tidyverse approach would be

df %>% group_by(Nest.box, Clutch, Obs.type) %>% 
  summarise(max = max(individual.number)) %>% 
  pivot_wider(id_cols = 1:2, names_from = Obs.type, values_from = max)

# A tibble: 3 x 4
# Groups:   Nest.box, Clutch [3]
  Nest.box Clutch Chick   Egg
  <chr>    <chr>  <dbl> <dbl>
1 Nest1    First      2     3
2 Nest2    First      2     2
3 Nest2    Second     2    NA

CodePudding user response:

This can all be done within tidyr::pivot_wider() for data.frames:

library(tidyr)

df1 %>% 
  pivot_wider(
    id_cols = c(Nest.box, Clutch),
    names_from = Obs.type, 
    values_from = individual.number, 
    values_fn = max
  )
#> # A tibble: 3 × 4
#>   Nest.box Clutch   Egg Chick
#>   <chr>    <chr>  <dbl> <dbl>
#> 1 Nest1    First      3     2
#> 2 Nest2    First      2     2
#> 3 Nest2    Second    NA     2

Created on 2022-04-01 by the reprex package (v2.0.1)

Data

df1 <- 
  tibble::tribble(
    ~Nest.box, ~Obs.type, ~individual.number, ~Clutch,
    "Nest1", "Egg", 1, "First",
    "Nest1", "Egg", 2, "First",
    "Nest1", "Egg", 3, "First",
    "Nest2", "Egg", 1, "First",
    "Nest2", "Egg", 2, "First",
    "Nest2", "Egg", 1, "First",
    "Nest1", "Chick", 1, "First",
    "Nest1", "Chick", 2, "First",
    "Nest2", "Chick", 1, "First",
    "Nest2", "Chick", 2, "First",
    "Nest2", "Chick", 1, "Second",
    "Nest2", "Chick", 2, "Second"
  )


CodePudding user response:

data.table solution

library(data.table)
setDT(df)[, .(Eggs = uniqueN(individual.number[Obs.type == "Egg"]),
              Chicks = uniqueN(individual.number[Obs.type == "Chick"])), 
          by = .(Nest.box, Clutch)]

#    Nest.box Clutch Eggs Chicks
# 1:    Nest1  First    3      2
# 2:    Nest2  First    2      2
# 3:    Nest2 Second    0      2

edit

also a possibility: casting to wide format

dcast(setDT(df), Nest.box   Clutch ~ Obs.type, value.var = "individual.number", fun.aggregate = uniqueN)
#    Nest.box Clutch Chick Egg
# 1:    Nest1  First     2   3
# 2:    Nest2  First     2   2
# 3:    Nest2 Second     2   0
  • Related