I am trying to aggregate data per weeks.
| Timestamp | District |
| -------- | -------------- |
| 2015-01-16 | Kreuzberg |
| 2015-01-10 | Charlottenburg |
| 2015-02-09 | Mitte |
| 2014-10-10 | Lichtenberg |
Each Timestamp represents an ill person. What I am trying to achieve is to get the districts as headers and aggregate the timestamps to weeks. It is important to see how many persons catched an infection in each district during each week.
| Week | Kreuzberg | Charlottenburg | Mitte | Lichtenberg
| -------- | ----------| ---------- | ----- | ----------
| 2015-01-16 | 1 | 0 | 0 | 0
| 2015-01-10 | 0 | 1 | 0 | 0
| 2015-02-09 | 0 | 0 | 1 | 0
| 2014-10-10 | 0 | 0 | 0 | 1
So far I have the following the code, which doesn't give me the results that I need.
new_df <-
df %>%
drop_na(Timestamp) %>%
mutate(week = floor_date(
Timestamp,
unit = "week")) %>%
count(week, District)
Any suggestions?
Best, Daniel
CodePudding user response:
Based on your code, you could pipe a pivot_wider
function:
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
drop_na(Timestamp) %>%
mutate(week = floor_date(
Timestamp,
unit = "week")) %>%
count(week, District) %>%
pivot_wider(names_from = District, values_from = n, values_fill = 0)
This returns
# A tibble: 4 x 5
week Lichtenberg Charlottenburg Kreuzberg Mitte
<date> <int> <int> <int> <int>
1 2014-10-05 1 0 0 0
2 2015-01-04 0 1 0 0
3 2015-01-11 0 0 1 0
4 2015-02-08 0 0 0 1