Home > Software engineering >  Aggregate weeks and modify headers retrieved from column
Aggregate weeks and modify headers retrieved from column

Time:04-06

I am trying to aggregate data per weeks.

| Timestamp     |     District   |
|   --------    | -------------- |
| 2015-01-16    | Kreuzberg      |
| 2015-01-10    | Charlottenburg |
| 2015-02-09    | Mitte          |
| 2014-10-10    | Lichtenberg    |

Each Timestamp represents an ill person. What I am trying to achieve is to get the districts as headers and aggregate the timestamps to weeks. It is important to see how many persons catched an infection in each district during each week.

|     Week      | Kreuzberg | Charlottenburg | Mitte | Lichtenberg
|   --------    | ----------|   ----------   | ----- | ----------
| 2015-01-16    |     1     |         0      |    0  |     0 
| 2015-01-10    |     0     |         1      |    0  |     0
| 2015-02-09    |     0     |         0      |    1  |     0
| 2014-10-10    |     0     |         0      |    0  |     1

So far I have the following the code, which doesn't give me the results that I need.

new_df <-
  df %>% 
  drop_na(Timestamp) %>%             
  mutate(week = floor_date(   
    Timestamp,
    unit = "week")) %>% 
  count(week, District) 

Any suggestions?

Best, Daniel

CodePudding user response:

Based on your code, you could pipe a pivot_wider function:

library(dplyr)
library(tidyr)
library(lubridate)

df %>% 
  drop_na(Timestamp) %>%             
  mutate(week = floor_date(   
    Timestamp,
    unit = "week")) %>% 
  count(week, District) %>% 
  pivot_wider(names_from = District, values_from = n, values_fill = 0)

This returns

# A tibble: 4 x 5
  week       Lichtenberg Charlottenburg Kreuzberg Mitte
  <date>           <int>          <int>     <int> <int>
1 2014-10-05           1              0         0     0
2 2015-01-04           0              1         0     0
3 2015-01-11           0              0         1     0
4 2015-02-08           0              0         0     1
  • Related