Say I have a vector of ages of 100 trees. Then I age those trees up for 5, 10, 15, and 20 years into the future to create a matrix of tree ages for this year and four 5-year planning periods in the future.
But then, I decide to cut some of those trees (only 10 per planning period), documented in a matrix of T/F values where T is harvested and F is not (trees can't be harvested twice).
age.vec <- sample(x = 1:150, size = 100, replace = T) # create our trees
age.mat <- cbind(age.vec, age.vec 5, age.vec 10, age.vec 15, age.vec 20) # grow them up
x.mat <- matrix(data = F, nrow = 100, ncol = 5) # create the empty harvest matrix
x.mat[cbind(sample(1:100, size = 50), rep(1:5, each = 10))] <- T # 10 trees/year harvested
So then, the ages of trees that are harvested become zero in that year:
age.mat[x.mat] <- 0
I then would like to age the harvested trees up again for the following periods. E.g. if a tree were harvested in the first planning period, in the second planning period (5 years later), I want the age of the tree to be 5, then in the third planning period (10 years later), I want the age of the tree to be 10. I have successfully implemented this in the following for loop:
for (i in 2:5){ # we don't need to calculate over the first year
age.mat[,i]<-age.mat[,i-1] 5L # add 5 to previous year
age.mat[x.mat[,i],i] <- 0L # reset age of harvested trees to zero
}
This works, however, it is clunky and slow. Is there a way to implement this faster (i.e. without the for loop)? It also is implemented within a function, which means that using "apply" actually slows things down, so it needs to be vectorized directly. This is something I'm iterating over thousands of times so speed is of the essence!
Thank you!
CodePudding user response:
This looks to be about 12x faster, based on testing with rbenchmark
.
Here's an approach relying on the fact that harvesting a tree doesn't stop the passage of time, it just resets the clock. So we can think of a harvest as subtracting the harvest age from all future ages.
x.die <- x.mat * age.mat
x.dif <- t(apply(x.die, 1, cumsum))
age.mat2 <- age.mat - x.dif
x.die
, by multiplying the harvests by the ages, we get the age at each harvest. The next line calculates the cumulative sum of these across each row, and finally we subtract those from the original ages.
I assume your "trees can't be harvested twice" means we won't ever see two TRUEs in one row of x.mat
? My code won't work right if there were more than one harvest per tree location.
CodePudding user response:
You can use apply to work on each vector rowwise, then use some logic within the function to adjust the values.
Should be about 4 times faster
age.mat |>
apply(1, \(x) {
if(any(x == 0 & (which(x == 0) != length(x)))) {
x[which(x == 0):length(x)] <- (0:(length(x) - which(x == 0))) * 5
x
} else x
}) |> t()
[,1] [,2] [,3] [,4] [,5]
[1,] 101 0 5 10 15
[2,] 55 60 65 70 75
[3,] 23 28 33 0 5
[4,] 0 5 10 15 20
[5,] 23 28 33 0 5
[6,] 84 0 5 10 15
[7,] 52 57 62 0 5
[8,] 26 31 36 41 0
[9,] 114 119 124 129 0
[10,] 33 38 43 48 53
[11,] 144 149 154 159 164
[12,] 19 24 29 34 39
[13,] 43 48 53 58 63
[14,] 69 74 79 84 89
[15,] 98 103 108 113 118
[16,] 110 115 120 125 130
[17,] 8 13 18 23 28
[18,] 16 21 26 31 36
[19,] 1 6 11 16 21
[20,] 60 65 0 5 10