Relative noob here and would appreciate any help. Basically I'd like to create an output csv file with the frequency of each outbreak and the first onset date, the last onset date and total duration.
I have a dataset that looks something like this:
df <- data.frame(outbreak_name = c("A","A","A","A","B","B","C","C","C"), onset = c(as.Date("2021-1-11"), "2021-2-2","2021-2-3","2021-3-3","2021-5-5","2021-7-5","2021-4-5","2021-2-3","2021-12-4"))
I have been able to create the columns with the dates like this
summary_ob <- df %>%
group_by(outbreak_name) %>%
mutate(first_onset = min(onset)) %>%
mutate(last_onset = max(onset)) %>%
mutate(duration = last_onset - first_onset)
And I can create a frequency table with a simple count.
summary_freq <- df %>%
group_by(outbreak_name) %>%
summarize(cases = n())
What I cannot figure out is how to combine this, so it would show outbreak A has 4 cases, first onset was xx, last onset was xx, outbreak has lasted for xx days. I'd then like to write.csv this as an output.
CodePudding user response:
library(dplyr)
df %>%
group_by(outbreak_name) %>%
summarize(
cases = n(),
first_onset = min(onset),
last_onset = max(onset)
) %>%
mutate(duration = last_onset - first_onset)
# A tibble: 3 x 5
outbreak_name cases first_onset last_onset duration
<chr> <int> <date> <date> <drtn>
1 A 4 2021-01-11 2021-03-03 51 days
2 B 2 2021-05-05 2021-07-05 61 days
3 C 3 2021-02-03 2021-12-04 304 days
After you can use write_csv
to export.
CodePudding user response:
We may do this wiht diff
on the range
of 'onset'
library(dplyr)
df %>%
group_by(outbreak_name) %>%
summarise(cases = n(), duration = diff(range(onset)))
-output
# A tibble: 3 x 3
outbreak_name cases duration
<chr> <int> <drtn>
1 A 4 51 days
2 B 2 61 days
3 C 3 304 days