I have the following two dataframes:
> Reaction_per_park_per_day_3
Park Date
14st NE - Coventry 2019-05-08
14st NE - Coventry 2019-05-15
14st NE - Coventry 2019-08-09
14st NE - Coventry 2019-08-22
Airways Park 2018-11-27
Airways Park 2020-12-16
Airways Park 2020-12-24
Arbour Lake East 2017-01-02
Arbour Lake East 2017-01-03
Arbour Lake East 2017-01-07
Arbour Lake East 2017-01-08
> Reports_per_park_per_day_3
Park Month
14st NE - Coventry 2019-05-16
14st NE - Coventry 2019-05-17
14st NE - Coventry 2019-08-14
Airways Park 2021-04-02
Arbour Lake East 2017-01-04
Arbour Lake East 2017-02-04
I would like to add a row to the Reports_per_park_per_day_3 dataframe (Number_AC), which would count the total number of events in the Reaction_per_park_per_Day_3 dataframe prior to each event in the Reports_per_park_per_day_3 within the same parks. I would therefore like the Reports_per_park_per_day_3 dataframe to look like this:
Park Month Number_AC
14st NE - Coventry 2019-05-16 2
14st NE - Coventry 2019-05-17 2
14st NE - Coventry 2019-08-14 3
Airways Park 2021-04-02 2
Arbour Lake East 2017-01-04 2
Arbour Lake East 2017-02-04 4
I tried the following, but it did not work, as it gave me 0 events for the entire row:
> library(dplyr)
> Reports_per_park_per_day_3 <- Reports_per_park_per_day_3 %>%
left_join( Reaction_per_park_per_day_3, by="Park" ) %>%
filter( Date <= Month ) %>%
group_by( Park, Month) %>%
summarize(Number_AC = sum(Month <= Date & Month >= Date), .groups = "drop") %>%
distinct
CodePudding user response:
By using merge you can do what you want to do. Keep in mind this solution isn't the fastest, and might cause memory problems if your data frames are large.
Reaction_per_park_per_day_3 %>%
merge( Reports_per_park_per_day_3 , by=NULL ) %>%
filter( Date <= Month,Park.x==Park.y ) %>%
select(Park = Park.x,Month,Date) %>%
count(Park,Month,name = "Number_AC")
CodePudding user response:
You may do a full_join
and for each combination of Park
and Month
count the number of values in Date
less than Month
value.
library(dplyr)
Reaction_per_park_per_day_3 %>%
full_join(Reports_per_park_per_day_3, by = 'Park') %>%
group_by(Park, Month) %>%
summarise(Number_AC = sum(Date <= Month), .groups = 'drop')
# Park Month Number_AC
# <chr> <chr> <int>
#1 14stNE-Coventry 2019-05-16 2
#2 14stNE-Coventry 2019-05-17 2
#3 14stNE-Coventry 2019-08-14 3
#4 AirwaysPark 2021-04-02 3
#5 ArbourLakeEast 2017-01-04 2
#6 ArbourLakeEast 2017-02-04 4