I have a large data table with multiple columns and a custom function. The data table looks something like that, and there are eight different bird_ID
types:
GPS_ID bird_ID device_ID devicetype timestamp date
1: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02
2: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02
3: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02
4: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02
5: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02
6: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02
The custom function calculates the difference in time between the timestamp
of two rows, and assigns a number in a new column named Position.Burst.ID
. If the diff
is more than 5 seconds, the number sequence advances, else it keeps the previously assigned number.
pbid <- function(data_table) {
newbout <- which(c(TRUE, diff(as.POSIXct(data_table$timestamp, tz = "UTC")) >= 5) == T)
boutind <- rep(seq_along(newbout), diff(c(newbout, (nrow(data_table) 1))))
data_table$Position.Burst.ID <- boutind
}
This function works great with one bird_ID
.
GPS_ID bird_ID device_ID devicetype timestamp date Position.Burst.ID
1: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02 1
2: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02 1
3: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02 1
4: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02 1
5: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02 1
6: NA 350E 202927 ornitela 2022-05-02 00:03:59 2022-05-02 1
I wanted to group_by(bird_ID)
, so it will start counting from the top for each bird_ID
data_table %>%
group_by(bird_ID) %>%
mutate(Position.Burst.ID = pbid(data_table))
That surely didn't work, because:
`Position.Burst.ID` must be size 419335 or 1, not 4592293.
Any ideas on how to approach this?
I have already tried to create a loop and put the function inside, but that was also a dead-end. And I really wanted to avoid using a for
loop with this amount of data.
CodePudding user response:
Here's how I'd do it:
data_table %>%
group_by(bird_ID) %>%
mutate(Position.Burst.ID = cumsum(timestamp - lag(timestamp, default = timestamp[1]) >= 5) 1)