I have data on plant species cover at site and plot level which looks like this:
SITE PLOT SPECIES AREA
1 1 A 0.3
1 1 B 25.5
1 1 C 1.0
1 2 A 0.3
1 2 C 0.3
1 2 D 0.3
2 1 B 17.9
2 1 C 131.2
2 2 A 37.3
2 2 C 0.3
2 3 A 5.3
2 3 D 0.3
I have successfully used the following code to obtain percentage values for species at various sites,
dfnew <- merge(df1, prop.table(xtabs(AREA ~ SPECIES SITE, df1), 2)*100)
I am trying now to find the relative proportion of each species within each plot(as a proportion of all species in the plot) with a desired output like the one below:
SITE PLOT SPECIES AREA Plot-freq
1 1 A 0.3 1.06
1 1 B 25.5 95.39
1 1 C 1.0 3.56
1 2 A 0.3 33.33
1 2 C 0.3 33.33
1 2 D 0.3 33.33
2 1 B 17.9 12.02
2 1 C 131.2 87.98
2 2 A 37.3 99.25
2 2 C 0.3 0.75
2 3 A 5.3 94.94
2 3 D 0.3 5.06
I tried adding the PLOT variable to the original code but ended up with tiny values
a <- merge(df1, prop.table(xtabs(AREA ~ SPECIES PLOT SITE, woods2), 2)*100)
I have been looking at similar questions, but most of those don't have similar data and none of the solutions seem to work for me. Any help much appreciated.
data
> dput(df1)
structure(list(SITE = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
PLOT = c(1, 1, 1, 2, 2, 2, 1, 1, 2, 2, 3, 3), SPECIES = c("A",
"B", "C", "A", "C", "D", "B", "C", "A", "C", "A", "D"), AREA = c(0.3,
25.5, 1, 0.3, 0.3, 0.3, 17.9, 131.2, 37.3, 0.3, 5.3, 0.3)), class = "data.frame", row.names = c(NA,
-12L))
CodePudding user response:
I'm not sure I completely understand your calculation, but I believe you can do this:
library(dplyr)
df1 %>% group_by(SITE, PLOT) %>% mutate(Plot_freq = AREA/sum(AREA))
Output:
SITE PLOT SPECIES AREA Plot_freq
<dbl> <dbl> <chr> <dbl> <dbl>
1 1 1 A 0.3 0.0112
2 1 1 B 25.5 0.951
3 1 1 C 1 0.0373
4 1 2 A 0.3 0.333
5 1 2 C 0.3 0.333
6 1 2 D 0.3 0.333
7 2 1 B 17.9 0.120
8 2 1 C 131. 0.880
9 2 2 A 37.3 0.992
10 2 2 C 0.3 0.00798
11 2 3 A 5.3 0.946
12 2 3 D 0.3 0.0536
CodePudding user response:
Very interesting to merge
with the prop.table
! I also wasn't lucky though, to modify your approach.
However, to avoid dplyr you may want to use ave
to calculate plot sums, then just pipe |>
it further to calculate the relative areas like so:
transform(df1, Psum=ave(AREA, SITE, PLOT, FUN=sum)) |> transform(Plot_freq=AREA/Psum*100)
# SITE PLOT SPECIES AREA Psum Plot_freq
# 1 1 1 A 0.3 26.8 1.1194030
# 2 1 1 B 25.5 26.8 95.1492537
# 3 1 1 C 1.0 26.8 3.7313433
# 4 1 2 A 0.3 0.9 33.3333333
# 5 1 2 C 0.3 0.9 33.3333333
# 6 1 2 D 0.3 0.9 33.3333333
# 7 2 1 B 17.9 149.1 12.0053655
# 8 2 1 C 131.2 149.1 87.9946345
# 9 2 2 A 37.3 37.6 99.2021277
# 10 2 2 C 0.3 37.6 0.7978723
# 11 2 3 A 5.3 5.6 94.6428571
# 12 2 3 D 0.3 5.6 5.3571429
Note: R >= 4.1 used.