How to loop according to a multiple of a value (R)-CodePudding

I have a set of observations, with nest IDs and date of the observation. I have created a variable that gives the interval between each observation for each nest. I need to create a new variable, that for each nest, assigns a unique value for each observation that falls within an interval period of 145 days. Essentially, a shortened version of my data looks like this:

   Date       Nest_ID interval total_interval
  
 1 2015-02-22 a              0              0
 2 2015-05-26 a             93             93
 3 2015-07-08 a             43            136
 4 2015-12-07 a            152            288
 5 2016-01-29 a             53            341
 6 2016-04-02 a             64            405
 7 2016-07-06 a             95            500
 8 2016-11-05 a            122            622
 9 2017-06-29 b              0              0
10 2015-02-22 c              0              0

and I need it to look like this:

    Date       Nest_ID interval total_interval      Nest_period
   
 1 2015-02-22 a              0              0                 1
 2 2015-05-26 a             93             93                 1
 3 2015-07-08 a             43            136                 1
 4 2015-12-07 a            152            288                 2
 5 2016-01-29 a             53            341                 2
 6 2016-04-02 a             64            405                 2
 7 2016-07-06 a             95            500                 3
 8 2016-11-05 a            122            622                 3
 9 2017-06-29 b              0              0                 1
10 2015-02-22 c              0              0                 1

If I could get the total_interval count to restart whenever it reaches 145 that seems like the ideal solution, however I have not been able to figure that out yet.

Here's some code to get the sample data:

Date <- c("2015-02-22", "2015-05-26", "2015-07-08", "2015-12-07", "2016-01-29", "2016-04-02", "2016-07-06", "2016-11-05", "2017-06-29", "2015-02-22")

Nest_ID <- c("a", "a", "a", "a", "a", "a", "a", "a", "b", "c")

interval <- (c(0,93,43,152,53,64,95,122,0,0))

total_interval <- (c(0,93,136,288,341,405,500,622,0,0))

df <- data.frame(Date, Nest_ID, interval, total_interval)

Any help greatly appreciated! Thanks!

CodePudding user response：

It's very messy but I can't find out the way to do this simple.

df <- df %>%
  mutate(interval = as.numeric(interval),
         total_interval = as.numeric(total_interval))
groups <- unique(df$Nest_ID)
res <- c()
for (i in groups){
  df2 <- df %>%
    filter(Nest_ID %in% i)
  n <- 1
  for(i in 1:dim(df2)[1]){
    if (df2$total_interval[i] < 145) {
      res<-c(res,n)
    } else {
      n <- n   1
      df2$total_interval <- df2$total_interval - df2$total_interval[i]
      res<-c(res,n)
    }
    print(n)
  }
  
  
}
res
df$idx <- replace_na(res, 1)
df

result is like below. idx is Nest_period you wanted

         Date Nest_ID interval total_interval idx
1  2015-02-22       a        0              0   1
2  2015-05-26       a       93             93   1
3  2015-07-08       a       43            136   1
4  2015-12-07       a      152            288   2
5  2016-01-29       a       53            341   2
6  2016-04-02       a       64            405   2
7  2016-07-06       a       95            500   3
8  2016-11-05       a      122            622   3
9  2017-06-29       b        0              0   1
10 2015-02-22       c        0              0   1

CodePudding user response：

In this cases I normally use a simple for-loop:

nest_period=vector()    
for (i in seq_along(df)){
    nest_period[i]=df$total_interval[i]%/%145
}

df <- as.data.frame(cbind(Date, Nest_ID, interval, total_interval, nest_period))

You will treat all the data along your dataframe and assign for every index the module between total_interval and 145 to nest_period.

CodePudding user response：

I provide one solution based on the case of mod(total_interval, 145). I find the mapping between Nest_period and mod(total_interval, 145) which can be used to avoid the inefficient loop.

library(data.table)
setDT(df) #set df as data.table
df[, tmp := total_interval %/% 145, by = Nest_ID]
df[, tmp := fifelse( tmp > 0 & (tmp %% 2 == 0), tmp - 1, tmp)]
df[, Nest_period := fcase( tmp == 0, 1,
                           tmp %% 2 != 0, (tmp   1)/2   1 )]
df[, tmp := NULL]
df
#>           Date Nest_ID interval total_interval Nest_period
#>  1: 2015-02-22       a        0              0           1
#>  2: 2015-05-26       a       93             93           1
#>  3: 2015-07-08       a       43            136           1
#>  4: 2015-12-07       a      152            288           2
#>  5: 2016-01-29       a       53            341           2
#>  6: 2016-04-02       a       64            405           2
#>  7: 2016-07-06       a       95            500           3
#>  8: 2016-11-05       a      122            622           3
#>  9: 2017-06-29       b        0              0           1
#> 10: 2015-02-22       c        0              0           1

Data:

Date <- c("2015-02-22", "2015-05-26", "2015-07-08", "2015-12-07", "2016-01-29", "2016-04-02", "2016-07-06", "2016-11-05", "2017-06-29", "2015-02-22")

Nest_ID <- c("a", "a", "a", "a", "a", "a", "a", "a", "b", "c")

interval <- c(0,93,43,152,53,64,95,122,0,0)

total_interval <- c(0,93,136,288,341,405,500,622,0,0)

df <- data.table(Date, Nest_ID, interval, total_interval)

^{Created on 2021-09-16 by the reprex package (v2.0.1)}