I have a dataset that looks something like this:
df <- data.frame(plot = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C"),
species = c("Fagus","Fagus","Quercus","Picea", "Abies","Fagus","Fagus","Quercus","Picea", "Abies","Fagus","Fagus","Quercus","Picea", "Abies"),
value = sample(100, size = 15, replace = TRUE))
head(df)
plot species value
1 A Fagus 53
2 A Fagus 48
3 A Quercus 5
4 A Picea 25
5 A Abies 12
6 B Fagus 12
Now, I want to create a new data frame containing per plot
values for share.conifers
and share.broadleaves
by basically summing the values
with conditions applied for species
. I thought about using case_when
but I am not sure how to write the syntax:
df1 <- df %>% share.broadleaves = case_when(plot = plot & species = "Fagus" or species = "Quercus" ~ FUN="sum")
df1 <- df %>% share.conifers = case_when(plot = plot & species = "Abies" or species = "Picea" ~ FUN="sum")
I know this is not right, but I would like something like this.
CodePudding user response:
Using dplyr
/tidyr
:
First construct the group, do the calculation and then spread into columns.
library(dplyr)
library(tidyr)
df |>
mutate(type = case_when(species %in% c("Fagus", "Quercus") ~ "broadleaves",
species %in% c("Abies", "Picea") ~ "conifers")) |>
group_by(plot, type) |>
summarise(share = sum(value)) |>
ungroup() |>
pivot_wider(values_from = "share", names_from = "type", names_prefix = "share.")
Output:
# A tibble: 3 × 3
plot share.broadleaves share.conifers
<chr> <int> <int>
1 A 159 77
2 B 53 42
3 C 204 63
I am not sure if you want to sum or get the share, but the code could easily be adapted to whatever goal you have.
CodePudding user response:
One way could just be summarizing by plot
and species
:
library(dplyr)
df |>
group_by(plot, species) |>
summarize(share = sum(value))
If you really want to get the share of a specific species per plot you could also do:
df |>
group_by(plot) |>
summarize(share_certain_species = sum(value[species %in% c("Fagus", "Quercus")]) / sum(value))
which gives:
# A tibble: 3 × 2
plot share_certain_species
<chr> <dbl>
1 A 0.546
2 B 0.583
3 C 0.480