I have a set of observations, with nest IDs and date of the observation. I have created a variable that gives the interval between each observation for each nest. I need to create a new variable, that for each nest, assigns a unique value for each observation that falls within an interval period of 145 days. Essentially, a shortened version of my data looks like this:
Date Nest_ID interval total_interval
1 2015-02-22 a 0 0
2 2015-05-26 a 93 93
3 2015-07-08 a 43 136
4 2015-12-07 a 152 288
5 2016-01-29 a 53 341
6 2016-04-02 a 64 405
7 2016-07-06 a 95 500
8 2016-11-05 a 122 622
9 2017-06-29 b 0 0
10 2015-02-22 c 0 0
and I need it to look like this:
Date Nest_ID interval total_interval Nest_period
1 2015-02-22 a 0 0 1
2 2015-05-26 a 93 93 1
3 2015-07-08 a 43 136 1
4 2015-12-07 a 152 288 2
5 2016-01-29 a 53 341 2
6 2016-04-02 a 64 405 2
7 2016-07-06 a 95 500 3
8 2016-11-05 a 122 622 3
9 2017-06-29 b 0 0 1
10 2015-02-22 c 0 0 1
If I could get the total_interval count to restart whenever it reaches 145 that seems like the ideal solution, however I have not been able to figure that out yet.
Here's some code to get the sample data:
Date <- c("2015-02-22", "2015-05-26", "2015-07-08", "2015-12-07", "2016-01-29", "2016-04-02", "2016-07-06", "2016-11-05", "2017-06-29", "2015-02-22")
Nest_ID <- c("a", "a", "a", "a", "a", "a", "a", "a", "b", "c")
interval <- (c(0,93,43,152,53,64,95,122,0,0))
total_interval <- (c(0,93,136,288,341,405,500,622,0,0))
df <- data.frame(Date, Nest_ID, interval, total_interval)
Any help greatly appreciated! Thanks!
CodePudding user response:
It's very messy but I can't find out the way to do this simple.
df <- df %>%
mutate(interval = as.numeric(interval),
total_interval = as.numeric(total_interval))
groups <- unique(df$Nest_ID)
res <- c()
for (i in groups){
df2 <- df %>%
filter(Nest_ID %in% i)
n <- 1
for(i in 1:dim(df2)[1]){
if (df2$total_interval[i] < 145) {
res<-c(res,n)
} else {
n <- n 1
df2$total_interval <- df2$total_interval - df2$total_interval[i]
res<-c(res,n)
}
print(n)
}
}
res
df$idx <- replace_na(res, 1)
df
result is like below. idx
is Nest_period
you wanted
Date Nest_ID interval total_interval idx
1 2015-02-22 a 0 0 1
2 2015-05-26 a 93 93 1
3 2015-07-08 a 43 136 1
4 2015-12-07 a 152 288 2
5 2016-01-29 a 53 341 2
6 2016-04-02 a 64 405 2
7 2016-07-06 a 95 500 3
8 2016-11-05 a 122 622 3
9 2017-06-29 b 0 0 1
10 2015-02-22 c 0 0 1
CodePudding user response:
In this cases I normally use a simple for-loop:
nest_period=vector()
for (i in seq_along(df)){
nest_period[i]=df$total_interval[i]%/%145
}
df <- as.data.frame(cbind(Date, Nest_ID, interval, total_interval, nest_period))
You will treat all the data along your dataframe and assign for every index the module between total_interval and 145 to nest_period.
CodePudding user response:
I provide one solution based on the case of mod(total_interval, 145)
.
I find the mapping between Nest_period
and mod(total_interval, 145)
which can be used to avoid the inefficient loop.
library(data.table)
setDT(df) #set df as data.table
df[, tmp := total_interval %/% 145, by = Nest_ID]
df[, tmp := fifelse( tmp > 0 & (tmp %% 2 == 0), tmp - 1, tmp)]
df[, Nest_period := fcase( tmp == 0, 1,
tmp %% 2 != 0, (tmp 1)/2 1 )]
df[, tmp := NULL]
df
#> Date Nest_ID interval total_interval Nest_period
#> 1: 2015-02-22 a 0 0 1
#> 2: 2015-05-26 a 93 93 1
#> 3: 2015-07-08 a 43 136 1
#> 4: 2015-12-07 a 152 288 2
#> 5: 2016-01-29 a 53 341 2
#> 6: 2016-04-02 a 64 405 2
#> 7: 2016-07-06 a 95 500 3
#> 8: 2016-11-05 a 122 622 3
#> 9: 2017-06-29 b 0 0 1
#> 10: 2015-02-22 c 0 0 1
Data:
Date <- c("2015-02-22", "2015-05-26", "2015-07-08", "2015-12-07", "2016-01-29", "2016-04-02", "2016-07-06", "2016-11-05", "2017-06-29", "2015-02-22")
Nest_ID <- c("a", "a", "a", "a", "a", "a", "a", "a", "b", "c")
interval <- c(0,93,43,152,53,64,95,122,0,0)
total_interval <- c(0,93,136,288,341,405,500,622,0,0)
df <- data.table(Date, Nest_ID, interval, total_interval)
Created on 2021-09-16 by the reprex package (v2.0.1)