Home > Enterprise >  Sum unique occurrences per night and create a new data frame in R
Sum unique occurrences per night and create a new data frame in R

Time:05-19

I have studied prey deliveries in a breeding owl and want to score the number of prey items delivered during the night to the nestlings. I define night as from 21 to 5. How could I make a new data frame with number of prey each night per location ID based upon these 24/7 observation dataset? In the new data frame, I wish to have the following columns: ID (A & B), No_prey_during_night (the sum of prey items), Time (date, e.g. 4/6 to 5/6), there will be a unique row per night per ID.

https://drive.google.com/file/d/1y5VCoNWZCmYbyWCktKfMSBqjOIaLeumQ/view?usp=sharing. I have done it in Excel so far, but very time demanding. I would be happy to get help with a simple script I could use in R.

CodePudding user response:

To take into account the fact that a night begins and ends on different dates, you could first assign all the morning hours to the prior day. The final label (the Time column in your question) then includes the next day. If the year of the data collection has a Feb 29, make sure the year is correct (I used 2022).

library(dplyr)
library(lubridate)

read.csv("Tot_prey_example.csv") %>% 
  mutate(time = make_datetime(year = 2022, month = Month, day = Day, hour = Hour),
         night_time = if_else(between(Hour, 0, 5), time - days(1), time),
         night_date = floor_date(night_time, unit = "day"),
         night = Hour <= 5 | Hour >= 21) %>% 
  filter(night) %>% 
  group_by(ID, night_date) %>% 
  summarise(No_prey_during_night = sum(n), .groups = "drop") %>% 
  mutate(next_day = night_date   days(1),
         Time = glue::glue("{day(night_date)}/{month(night_date)} to {day(next_day)}/{month(next_day)}")) %>% 
  select(ID, No_prey_during_night, Time)
#> # A tibble: 88 × 3
#>    ID    No_prey_during_night Time        
#>    <chr>                <int> <glue>      
#>  1 A                       12 4/6 to 5/6  
#>  2 A                       22 5/6 to 6/6  
#>  3 A                       20 6/6 to 7/6  
#>  4 A                       14 7/6 to 8/6  
#>  5 A                       14 8/6 to 9/6  
#>  6 A                       27 9/6 to 10/6 
#>  7 A                       22 10/6 to 11/6
#>  8 A                       18 11/6 to 12/6
#>  9 A                       22 12/6 to 13/6
#> 10 A                       25 13/6 to 14/6
#> # … with 78 more rows

Created on 2022-05-18 by the reprex package (v2.0.1)

CodePudding user response:

You can do something like this:

library(dplyr)
library(lubridate)

read.csv("Tot_prey_example.csv") %>% 

  # create initial datetime variable, `night`
  mutate(night = lubridate::make_datetime(2021, Month,Day,Hour)) %>% 

  # filter to nighttime hours
  filter(Hour>=21 | Hour<=5) %>% 

  # flip datetime variable to the next day if hour is >=21
  mutate(night = if_else(Hour>=21,night   60*60*24, night)) %>% 

  # now group by the date part of `night`
  group_by(ID,Night_No = as.Date(night)) %>% 

  # summarize the sum of prey
  summarize(
    No_prey_during_night = sum(n),
    No_deliveries_during_night = sum(PreyDelivery)
  ) %>% 

  # replace the Night_No with a character variable showing both dates
  mutate(Night_No = paste0(Night_No-1, "-", Night_No))

Output:

# A tibble: 88 × 4
# Groups:   ID [2]
   ID    Night_No              No_prey_during_night No_deliveries_during_night
   <chr> <chr>                                <int>                      <int>
 1 A     2021-06-04-2021-06-05                   12                          5
 2 A     2021-06-05-2021-06-06                   22                          6
 3 A     2021-06-06-2021-06-07                   20                          5
 4 A     2021-06-07-2021-06-08                   14                          6
 5 A     2021-06-08-2021-06-09                   14                          5
 6 A     2021-06-09-2021-06-10                   27                          5
 7 A     2021-06-10-2021-06-11                   22                          4
 8 A     2021-06-11-2021-06-12                   18                          6
 9 A     2021-06-12-2021-06-13                   22                          6
10 A     2021-06-13-2021-06-14                   25                          5
# … with 78 more rows
  • Related