I have a dataset of random, sometimes infrequent, events that I want to count as a sum per week. Due to the randomness they are not linear so other examples I have tried so far are not applicable.
The data is similar to this:
df_date <- data.frame( Name = c("Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim","Jim",
"Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue","Sue"),
Dates = c("2010-1-1", "2010-1-2", "2010-01-5","2010-01-17","2010-01-20",
"2010-01-29","2010-02-6","2010-02-9","2010-02-16","2010-02-28",
"2010-1-1", "2010-1-2", "2010-01-5","2010-01-17","2010-01-20",
"2010-01-29","2010-02-6","2010-02-9","2010-02-16","2010-02-28"),
Event = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) )
What I'm trying to do is create a new table that contains the sum of events per week in the calendar year.
In this case producing something like this:
Name Week Events
Jim 1 3
Sue 1 3
Jim 2 0
Sue x ... x
and so on...
CodePudding user response:
Update OP request for multiple years:
We could use isoweek
also from lubridate
instead of week
OR:
We could add the year as follows:
df_date %>%
as_tibble() %>%
mutate(Week = week(ymd(Dates))) %>%
mutate(Year = year(ymd(Dates))) %>%
count(Name, Year, Week)
We could use lubridate
s Week
function after transforming character Dates
to date format with lubridate
s ymd
function.
Then we can use count
which is the short for group_by(Name, Week) %>% summarise(Count = n())
:
library(dplyr)
library(lubridate)
df_date %>%
as_tibble() %>%
mutate(Week = week(ymd(Dates))) %>%
count(Name, Week)
Name Week n
<chr> <dbl> <int>
1 Jim 1 3
2 Jim 3 2
3 Jim 5 1
4 Jim 6 2
5 Jim 7 1
6 Jim 9 1
7 Sue 1 3
8 Sue 3 2
9 Sue 5 1
10 Sue 6 2
11 Sue 7 1
12 Sue 9 1
CodePudding user response:
Here is an approach that gets you each ISO week for each individual, with zeros when there are no events for that week for that individual:
get_dates_df <- function(d) {
data.frame(date = seq(min(d, na.rm=T),max(d,na.rm=T),1)) %>%
mutate(Year=year(date), Week=week(date)) %>%
distinct(Year, Week)
}
df_date = df_date %>% mutate(Dates=lubridate::ymd(Dates))
left_join(
full_join(distinct(df_date %>% select(Name)), get_dates_df(df_date$Dates), by=character()),
df_date %>%
group_by(Name,Year=year(Dates), Week=week(Dates)) %>%
summarize(Events = sum(Event), .groups="drop")
) %>%
mutate(Events=if_else(is.na(Events),0,Events))
Output:
Name Year Week Events
1 Jim 2010 1 3
2 Jim 2010 2 0
3 Jim 2010 3 2
4 Jim 2010 4 0
5 Jim 2010 5 1
6 Jim 2010 6 2
7 Jim 2010 7 1
8 Jim 2010 8 0
9 Jim 2010 9 1
10 Sue 2010 1 3
11 Sue 2010 2 0
12 Sue 2010 3 2
13 Sue 2010 4 0
14 Sue 2010 5 1
15 Sue 2010 6 2
16 Sue 2010 7 1
17 Sue 2010 8 0
18 Sue 2010 9 1