I am attempting to calculate a forestry biometric called top height for a dataset containing several forest stands each with numerous plots. This biometric requires finding the largest diameter trees representing 40 trees per acre in a plot or a stand, calculating the cumulative trees per acre they represent, and their cumulative height, then dividing the cumulative height by the cumulative trees per acre. This requires a user-defined function, which I have created. My function accepts five arguments: data
- a data.frame
of tree biometric data, dbh
- the column representing the diameter a breast height for individual trees, ht
- the column representing the height for the individual trees, tpa
- the trees per acre each individual represents, and n
- the number of trees per acre to consider in the calculation, by default this is 40 (a forest biometrics standard value in empirical units). As part of my user defined function, I need to order the trees within a plot or stand by the descending order of dbh
. I am attempting to use dplyr::
group_by() %>% summarize()
to perform this function on each plot and stand combination. However, when I use the "piping" method to pass the data from the group_by()
to the summarize()
function, the data do not get passed. R throws the following error:
Error in `summarize()`:
! Problem while computing `TOP_HT = topht(dbh = dbh, ht = ht, tpa =
tpa, n = 40)`.
ℹ The error occurred in group 1: groups = "A".
Caused by error:
! argument "data" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
The obvious answer would simply be take out the data argument and just define the function on the tree biometric arguments. However, this won't work as I need to order all of the variables by descending order of dbh
. Is there a way I can pass the grouped data to the data
argument within my call to summarize()
? Below is my reproducible example with fake data:
##Loading Necessary Package##
library(dplyr)
##Setting Random Number Seed for Reproducibility##
set.seed(55)
##Generating Some Fake Data##
groups<-c(rep("A", 5), rep("B", 5))
ht<-rnorm(10, 125, 20)
tpa<-rnorm(10, 150, 60)
dbh<-rnorm(10, 20, 2)
DF<-data.frame(groups=groups, dbh=dbh, ht=ht, tpa=tpa)
##Defining the topht function##
topht<-function(data, dbh=NULL, ht=NULL, tpa=NULL, n=40){ #function parameters
##evaluate function parameters in the data environment
tmp<-eval(substitute(dbh), envir = data)
odata<-data[base::order(tmp, decreasing=TRUE),]
ht<-eval(substitute(ht), envir=odata)
tpa<-eval(substitute(tpa), envir=odata)
#creating variables for cumulative trees per acre and cumulative height calculations#
cumtpa<-0
cumht<-0
#beginning a loop to calculate top height#
for(i in 1:nrow(odata)){#setting looping range
if(cumtpa < n){ #only run cumulative adding when cumulative trees per acre is less than n
cumtpa<-tpa[i] cumtpa
cumht<-(ht[i]*tpa[i]) cumht
}#Close conditional
if(cumtpa==n){#End the loop if cumulative tpa = n
break
}#End Conditional
if(cumtpa > n){#Adjust final tree's weight when cumulative tpa exceeds n and end loop
delta <- cumtpa - n
cumtpa<-cumtpa-delta
cumht<-cumht-(delta*ht[i])
break
}#End Conditional
if(cumtpa>0){#Define calculation of top height when trees per acre > 0
topht<-cumht/cumtpa
}else{#Define complement of conditional
topht<-0
}#Close conditional
}#Close loop
return(topht)#Output top height
}#Close function
##Attempting to run top height function independently for groups A and B##
out<-as.data.frame(DF %>% group_by(groups) %>% summarize(TOP_HT=topht(dbh=dbh,ht=ht,tpa=tpa,n=40)))#Throws error
CodePudding user response:
I tried to repair your function and apply it to your data:
library(dplyr)
topht <- function(data, dbh = NULL, ht = NULL, tpa = NULL, n = 40){
##evaluate function parameters in the data environment
tmp <- data %>% pull({{ dbh }})
odata <- data[base::order(tmp, decreasing=TRUE),]
ht <- odata %>% pull({{ ht }})
tpa <- data %>% pull({{ tpa }})
#creating variables for cumulative trees per acre and cumulative height calculations#
cumtpa <- 0
cumht <- 0
outcome <- 0
for(i in 1:nrow(odata)) {
if(cumtpa < n){
cumtpa <- tpa[i] cumtpa
cumht <- (ht[i] * tpa[i]) cumht
} else if(cumtpa == n){
break
} else {
delta <- cumtpa - n
cumtpa <- cumtpa - delta
cumht <- cumht - (delta*ht[i])
break
}
if(cumtpa > 0) {
outcome <- cumht / cumtpa
} else {
outcome <- 0
}
}
outcome
}
Now we apply this function to each group:
DF %>%
group_by(groups) %>%
group_modify(~ .x %>% summarize(TOP_HT = topht(., dbh = dbh, ht = ht, tpa = tpa, n = 40))) %>%
ungroup() %>%
as.data.frame()
We want to apply topht
to each group, so we use group_modify
(it's like purrr
's little sister). This returns
groups TOP_HT
1 A 88.75246
2 B 123.01531
A few words of explanation:
- Since your function is named
topht
, you really should not usetopht
as variable name (even inside this function). I changed it tooutcome
. outcome
should be defined / initialised with some value. I chose0
,NA
or something else might also be possible.return()
at the end of a function is unneccessary. Just use the variable name.- To evaluate the function's arguments (like
dbh = dbh
) you need the curly-curly operator. As a reference: https://www.r-bloggers.com/2019/06/curly-curly-the-successor-of-bang-bang/ - Your first
if
-construction should be packed together into anif-else if - else
construction. - To improve readability, you can use some spacing (see http://adv-r.had.co.nz/Style.html).