How to sum values from one column based on specific conditions from other column in R?-CodePudding

I have a dataset that looks something like this:

df <- data.frame(plot = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C"),
                 species = c("Fagus","Fagus","Quercus","Picea", "Abies","Fagus","Fagus","Quercus","Picea", "Abies","Fagus","Fagus","Quercus","Picea", "Abies"),
                 value =  sample(100, size = 15, replace = TRUE)) 

head(df)
  plot species value
1    A   Fagus    53
2    A   Fagus    48
3    A Quercus     5
4    A   Picea    25
5    A   Abies    12
6    B   Fagus    12

Now, I want to create a new data frame containing per plot values for share.conifers and share.broadleaves by basically summing the values with conditions applied for species. I thought about using case_when but I am not sure how to write the syntax:

df1 <- df %>% share.broadleaves = case_when(plot = plot & species = "Fagus" or species = "Quercus" ~ FUN="sum")

df1 <- df %>% share.conifers = case_when(plot = plot & species = "Abies" or species = "Picea" ~ FUN="sum")

I know this is not right, but I would like something like this.

CodePudding user response：

Using dplyr/tidyr:

First construct the group, do the calculation and then spread into columns.

library(dplyr)
library(tidyr)

df |>
  mutate(type = case_when(species %in% c("Fagus", "Quercus") ~ "broadleaves",
                          species %in% c("Abies", "Picea") ~ "conifers")) |>
  group_by(plot, type) |>
  summarise(share = sum(value)) |>
  ungroup() |>
  pivot_wider(values_from = "share", names_from = "type", names_prefix = "share.")

Output:

# A tibble: 3 × 3
  plot  share.broadleaves share.conifers
  <chr>             <int>          <int>
1 A                   159             77
2 B                    53             42
3 C                   204             63

I am not sure if you want to sum or get the share, but the code could easily be adapted to whatever goal you have.

CodePudding user response：

One way could just be summarizing by plot and species:

library(dplyr)
df |>
  group_by(plot, species) |>
  summarize(share = sum(value))

If you really want to get the share of a specific species per plot you could also do:

df |>
  group_by(plot) |>
  summarize(share_certain_species = sum(value[species %in% c("Fagus", "Quercus")]) / sum(value))

which gives:

# A tibble: 3 × 2
  plot  share_certain_species
  <chr>                 <dbl>
1 A                     0.546
2 B                     0.583
3 C                     0.480