How to output the cumulative sum of a variable that undergo multiple % numeric reductions [R]-CodePudding

I'm looking to assess the importance of the cleaning frequency per day in the amount of feces that accumulates in the floor of a cattle barn. The problem is that I'm only able to correctly calculate the amount of feces after the first cleaning event (i.e, first reduction in feces), with subsequent calculations failing to consider that a cleaning event already happened.

For my attempt, I defined four variables and two constants in R:

Variables

"hour" (numeric; from 1 to 24)
"binclean" (binary; 0=no cleaning event, 1=cleaning event)
"noclean" (numeric; variable storing the cumulative amount of feces in the barn's floor when there is no cleaning)
"clean" (numeric; contains the amount of feces in the barn's floor after cleaning)

Constants

"totalfeceshour" (amount of feces produced every hour by 250 cattle = 239.5 kg)
"H" (amount of feces remaining in the environment after the floor is scraped = 0.05% of feces)

I created a matrix to store all the variables and perform calculations. What I've done is use the Reduce function (base R function) to get the feces accumulated without any cleaning and store it in the noclean variable. After this, I used a "for loop" with the ifelse() function to signal the occurrence of a cleaning event. Information about feces after cleaning is then stored in the "clean" variable. So far, I've been able to obtain the correct amount of feces on the floor only after the first cleaning event, but this information is not carried on to subsequent cells. For example, by the 7th hour of the day, there are 1,676 kg of feces in the floor. If a cleaning event occurs at the 8th hour, then the total amount of feces after cleaning is (1,676 239.5) * 0.05 = 95.8. If you run the code, you will see that the amount of feces by the 8th hour goes up to 2,155.5 (when it should be 95.8 239.5 = 335.3 after a cleaning event already happened). This is my code:

#Constants
totalfeceshour <- 239.5
H <- 1-0.95

#Variables
hour<- 1:24
cleanbin <- c(0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 
              0, 0, 0, 0, 0, 0, 1) #3 scrapes at hours 8, 16 and 
                                   #24. Each value represent an 
                                   #hour of the day.
noclean <- c(Reduce(" ", c(totalfeceshour, rep(totalfeceshour, 
             23)), accumulate = TRUE)) #Feces in the environment
clean <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)

#Matrix
mtrx <- matrix(data=c(hour, cleanbin, noclean, clean), ncol=4)
colnames(mtrx) <- c("hour", "cleanbin", "noclean", "clean")

#Loop
for(i in 1:length(mtrx[, 2])){
  mtrx[, 4][i] <- ifelse(mtrx[, 2][i] == 0, mtrx[, 3][i], mtrx[,3] 
   [i]*H)
}

mtrx

If you run the code, you could see that cleaning only reduced the amount of feces in cells in which cleanbin==1, but this information is not carried on to to subsequent cells. I've tried using nested "for" loops, "shift" function ("data.table" package), and other tools but haven't been able to solve this yet. I was able to create an Excel sheet that does the trick, but I am looking to achieve the same in R. I would deeply appreciate your assistance in coding this.

CodePudding user response：

Here's a tidyverse(-ish) solution. The most difficult part of the solution is that you can't use built-in solutions like group_map or ifelse as they stand because all of these options work atomically - that is, they operate on a column as whole, whereas you need to work iteratively, building up a solution row by row.

Because my workflow is different to yours, I've renamed some of your variables and dropped others.

First, create a data frame containing the basic information that defines the problem.

library(tidyverse)

d <- tibble(
       Hour=1:24,
       FecesProduced=239.5,
       CleaningEvent=Hour %in% c(8, 16, 24),
       CumulativeFeces=ifelse(Hour==1, FecesProduced, NA)
     ) 
d
# A tibble: 24 × 4
    Hour FecesProduced CleaningEvent CumulativeFeces
   <int>         <dbl> <lgl>                   <dbl>
 1     1          240. FALSE                    240.
 2     2          240. FALSE                     NA 
 3     3          240. FALSE                     NA 
 4     4          240. FALSE                     NA 
 5     5          240. FALSE                     NA 
 6     6          240. FALSE                     NA 
 7     7          240. FALSE                     NA 
 8     8          240. TRUE                      NA 
 9     9          240. FALSE                     NA 
10    10          240. FALSE                     NA 
# … with 14 more rows

Now step through rows 2 to 24, populating CumulativeFeces as required.

for(row in 2:nrow(d)) {
  d <- d %>% mutate(
               CumulativeFeces=ifelse(
                                 row_number() == row,
                                 ifelse(
                                   CleaningEvent,
                                   (FecesProduced   lag(CumulativeFeces)) * 0.05,
                                   FecesProduced   lag(CumulativeFeces)
                                  ),
                                 CumulativeFeces
                                )
             )
}

Now override the default behaviour when printing tibbles to ensure the accuracy of the solution is demonstrated.

d %>% mutate(CumulativeFeces=num(CumulativeFeces, digits=2))
# A tibble: 24 × 4
    Hour FecesProduced CleaningEvent CumulativeFeces
   <int>         <dbl> <lgl>               <num:.2!>
 1     1          240. FALSE                  239.50
 2     2          240. FALSE                  479.00
 3     3          240. FALSE                  718.50
 4     4          240. FALSE                  958.00
 5     5          240. FALSE                 1197.50
 6     6          240. FALSE                 1437.00
 7     7          240. FALSE                 1676.50
 8     8          240. TRUE                    95.80
 9     9          240. FALSE                  335.30
10    10          240. FALSE                  574.80
# … with 14 more rows