Home > Mobile >  group_by( ) and mutate( ) do not match sizes
group_by( ) and mutate( ) do not match sizes

Time:10-21

I have a large data table with multiple columns and a custom function. The data table looks something like that, and there are eight different bird_ID types:

   GPS_ID bird_ID device_ID devicetype           timestamp       date
1:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02
2:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02
3:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02
4:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02
5:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02
6:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02

The custom function calculates the difference in time between the timestamp of two rows, and assigns a number in a new column named Position.Burst.ID. If the diff is more than 5 seconds, the number sequence advances, else it keeps the previously assigned number.

pbid <- function(data_table) {
  newbout <- which(c(TRUE, diff(as.POSIXct(data_table$timestamp, tz = "UTC")) >= 5) == T)
  boutind <- rep(seq_along(newbout), diff(c(newbout, (nrow(data_table)   1))))
  data_table$Position.Burst.ID <- boutind
}

This function works great with one bird_ID.

   GPS_ID bird_ID device_ID devicetype           timestamp       date Position.Burst.ID   
1:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02                 1
2:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02                 1
3:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02                 1
4:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02                 1
5:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02                 1
6:     NA    350E    202927   ornitela 2022-05-02 00:03:59 2022-05-02                 1

I wanted to group_by(bird_ID), so it will start counting from the top for each bird_ID

data_table %>%
  group_by(bird_ID) %>%
  mutate(Position.Burst.ID = pbid(data_table))

That surely didn't work, because:

`Position.Burst.ID` must be size 419335 or 1, not 4592293.

Any ideas on how to approach this?

I have already tried to create a loop and put the function inside, but that was also a dead-end. And I really wanted to avoid using a for loop with this amount of data.

CodePudding user response:

Here's how I'd do it:

data_table %>%
  group_by(bird_ID) %>%
  mutate(Position.Burst.ID = cumsum(timestamp - lag(timestamp, default = timestamp[1]) >= 5)   1)
  • Related