Home > OS >  R - Calculating number of reports prior to an event in another dataframe
R - Calculating number of reports prior to an event in another dataframe

Time:10-19

I have the following two dataframes:

> Reaction_per_park_per_day_3

Park                Date
14st NE - Coventry  2019-05-08
14st NE - Coventry  2019-05-15
14st NE - Coventry  2019-08-09
14st NE - Coventry  2019-08-22
Airways Park        2018-11-27
Airways Park        2020-12-16
Airways Park        2020-12-24
Arbour Lake East    2017-01-02
Arbour Lake East    2017-01-03
Arbour Lake East    2017-01-07
Arbour Lake East    2017-01-08

> Reports_per_park_per_day_3

Park                Month
14st NE - Coventry  2019-05-16
14st NE - Coventry  2019-05-17
14st NE - Coventry  2019-08-14
Airways Park        2021-04-02
Arbour Lake East    2017-01-04
Arbour Lake East    2017-02-04

I would like to add a row to the Reports_per_park_per_day_3 dataframe (Number_AC), which would count the total number of events in the Reaction_per_park_per_Day_3 dataframe prior to each event in the Reports_per_park_per_day_3 within the same parks. I would therefore like the Reports_per_park_per_day_3 dataframe to look like this:

Park                Month        Number_AC
14st NE - Coventry  2019-05-16   2
14st NE - Coventry  2019-05-17   2
14st NE - Coventry  2019-08-14   3
Airways Park        2021-04-02   2
Arbour Lake East    2017-01-04   2  
Arbour Lake East    2017-02-04   4

I tried the following, but it did not work, as it gave me 0 events for the entire row:

> library(dplyr)

> Reports_per_park_per_day_3 <- Reports_per_park_per_day_3 %>%
  left_join( Reaction_per_park_per_day_3, by="Park" ) %>%
  filter( Date <= Month ) %>%
  group_by( Park, Month) %>%
  summarize(Number_AC = sum(Month <= Date & Month >= Date), .groups = "drop") %>%
  distinct

CodePudding user response:

By using merge you can do what you want to do. Keep in mind this solution isn't the fastest, and might cause memory problems if your data frames are large.

Reaction_per_park_per_day_3 %>%
  merge( Reports_per_park_per_day_3 , by=NULL ) %>%
  filter( Date <= Month,Park.x==Park.y ) %>%
  select(Park = Park.x,Month,Date) %>% 
  count(Park,Month,name = "Number_AC")

CodePudding user response:

You may do a full_join and for each combination of Park and Month count the number of values in Date less than Month value.

library(dplyr)

Reaction_per_park_per_day_3 %>%
  full_join(Reports_per_park_per_day_3, by = 'Park') %>%
  group_by(Park, Month) %>%
  summarise(Number_AC = sum(Date <= Month), .groups = 'drop')

#  Park            Month      Number_AC
#  <chr>           <chr>          <int>
#1 14stNE-Coventry 2019-05-16         2
#2 14stNE-Coventry 2019-05-17         2
#3 14stNE-Coventry 2019-08-14         3
#4 AirwaysPark     2021-04-02         3
#5 ArbourLakeEast  2017-01-04         2
#6 ArbourLakeEast  2017-02-04         4
  •  Tags:  
  • r
  • Related