Creating a census table based on multiple variables in R-CodePudding

I'm new to R and really struggling with what feels like a straightforward problem (that I haven't been able to find answers for).

I have a relatively big data table that essentially includes -people -where they live -what they do -move-in dates -move-out dates. My goal is to derive a running weekly census table with each week as a row, and a column for each occupation and city, populated with the headcount at that time.

#MRE

library(tidyverse) 
library(lubridate)

data <- data.frame(
  first_names = c("joe", "sally", "bob", "frank", "susy"),
  move_in = as.Date(c("2020-01-01", "2021-01-04", "2020-04-01", "2018-12-20", "2019-10-12")),
  move_out = as.Date(c("2021-01-01", NA, "2021-10-01", NA, NA)),
  city = c("Denver", "Phoenix", "Austin", "Denver", "Seattle"),
  occupation = c("doctor", "doctor", "architect", "teacher", "teacher"))

#what I've tried :
  
cities = unique(data$city)[!is.na(unique(data$city))]
occupations = unique(data$occupation)[!is.na(unique(data$occupation))]
weeks <- (date = seq(from = as.Date("2020-12-27"), to = as.Date(today()), by="1 week"))

census <- matrix(data=NA, nrows=44, ncols=12) 

for (i in seq(cities)){
  for (j in seq(occupations)){
    count <- data %>% 
      filter(cities == i) %>%
      filter(occupations == j) %>% 
      sapply(weeks, function(x)
        sum(
          ((as.Date(data$move_in)) <= as.Date(x) &
           (as.Date(data$move_out)) > as.Date(x))|
          ((as.Date(data$move_in)) <= as.Date(x) &
           is.na(data$move_out))))
  
  census[j,x] <- count
}}

Any help is greatly appreciated!

CodePudding user response：

Here's a possible solution using some tidyverse verbs, since you loaded that package. We loop over the weeks you're interested in using the map_dfr function, and for each week we collect a subset of the people who are there using your logic statement above. Then, we can use group_by to skip the double outer loop and count them directly. Finally, we mutate a new column for week to keep them straight after they're bound together. Outside of the loop, we then pivot_wider to get the one-column-per-occupation and one-row-per-week format that you're looking for.

library(tidyverse)

data <- data.frame(
  first_names = c("joe", "sally", "bob", "frank", "susy"),
  move_in = as.Date(c("2020-01-01", "2021-01-04", "2020-04-01", "2018-12-20", "2019-10-12")),
  move_out = as.Date(c("2021-01-01", NA, "2021-10-01", NA, NA)),
  city = c("Denver", "Phoenix", "Austin", "Denver", "Seattle"),
  occupation = c("doctor", "doctor", "architect", "teacher", "teacher"))

# Avoid needing to load lubridate by using Sys.Date() instead of today()
weeks <- (date = seq(from = as.Date("2020-12-27"), to = as.Date(Sys.Date()), by="1 week"))

map_dfr(weeks, function(week_i){
  data %>%
    filter(move_in<week_i & move_out > week_i | move_in < week_i & is.na(move_out)) %>%
    group_by(city, occupation) %>%
    count() %>%
    mutate(week=week_i)
}) %>%
  pivot_wider(values_from = n, names_from = occupation, values_fill = 0)

which returns

# A tibble: 170 x 5
# Groups:   city [4]
   city    week       architect doctor teacher
   <chr>   <date>         <int>  <int>   <int>
 1 Austin  2020-12-27         1      0       0
 2 Denver  2020-12-27         0      1       1
 3 Seattle 2020-12-27         0      0       1
 4 Austin  2021-01-03         1      0       0
 5 Denver  2021-01-03         0      0       1
 6 Seattle 2021-01-03         0      0       1
 7 Austin  2021-01-10         1      0       0
 8 Denver  2021-01-10         0      0       1
 9 Phoenix 2021-01-10         0      1       0
10 Seattle 2021-01-10         0      0       1
# ... with 160 more rows

It looks like you're getting errors due to a couple typos. You're using the filter verb to ask for the cities column, but data only has a city column in the sample data set. Same for occupations vs occupation. Good to keep in mind for the future, but great first effort and nicely provided example!

CodePudding user response：

I used data.table. lubridate isn't needed, I used Sys.Date().
I made census a data.table also, instead of a matrix.
data.table::CJ is pretty much the same as expand.grid.
Then used mapply instead of for loops.
Finally, re-organized from long to wide, as I think that's what you wanted.
I left in all the city_occupation combinations - not sure if that was the intent.

library(data.table)
library(magrittr)

data <- data.frame(
  first_names = c("joe", "sally", "bob", "frank", "susy"),
  move_in = as.Date(c("2020-01-01", "2021-01-04", "2020-04-01", "2018-12-20", "2019-10-12")),
  move_out = as.Date(c("2021-01-01", NA, "2021-10-01", NA, NA)),
  city = c("Denver", "Phoenix", "Austin", "Denver", "Seattle"),
  occupation = c("doctor", "doctor", "architect", "teacher", "teacher"))
cities <- unique(data$city)[!is.na(unique(data$city))]
occupations <- unique(data$occupation)[!is.na(unique(data$occupation))]
weeks <- (date = seq(from = as.Date("2020-12-27"), to = Sys.Date(), by="1 week"))

data %>% setDT()
census <- CJ(week = weeks, city = cities, occupation = occupations) %>% 
  .[, count := mapply(function(wk, cty, occ) {
    data[city == cty & occupation == occ, 
         sum(move_in <= wk & (move_out > wk | is.na(move_out)))]
  }, week, city, occupation)]

census %<>% dcast(week ~ city   occupation, value.var = 'count')

Gives:

census
          week Austin_architect Austin_doctor Austin_teacher Denver_architect
 1: 2020-12-27                1             0              0                0
 2: 2021-01-03                1             0              0                0
 3: 2021-01-10                1             0              0                0
 4: 2021-01-17                1             0              0                0
 5: 2021-01-24                1             0              0                0
 6: 2021-01-31                1             0              0                0
 7: 2021-02-07                1             0              0                0
 8: 2021-02-14                1             0              0                0
 9: 2021-02-21                1             0              0                0
10: 2021-02-28                1             0              0                0
11: 2021-03-07                1             0              0                0
12: 2021-03-14                1             0              0                0
13: 2021-03-21                1             0              0                0
14: 2021-03-28                1             0              0                0
15: 2021-04-04                1             0              0                0
16: 2021-04-11                1             0              0                0
17: 2021-04-18                1             0              0                0
18: 2021-04-25                1             0              0                0
19: 2021-05-02                1             0              0                0
20: 2021-05-09                1             0              0                0
21: 2021-05-16                1             0              0                0
22: 2021-05-23                1             0              0                0
23: 2021-05-30                1             0              0                0
24: 2021-06-06                1             0              0                0
25: 2021-06-13                1             0              0                0
26: 2021-06-20                1             0              0                0
27: 2021-06-27                1             0              0                0
28: 2021-07-04                1             0              0                0
29: 2021-07-11                1             0              0                0
30: 2021-07-18                1             0              0                0
31: 2021-07-25                1             0              0                0
32: 2021-08-01                1             0              0                0
33: 2021-08-08                1             0              0                0
34: 2021-08-15                1             0              0                0
35: 2021-08-22                1             0              0                0
36: 2021-08-29                1             0              0                0
37: 2021-09-05                1             0              0                0
38: 2021-09-12                1             0              0                0
39: 2021-09-19                1             0              0                0
40: 2021-09-26                1             0              0                0
41: 2021-10-03                0             0              0                0
42: 2021-10-10                0             0              0                0
43: 2021-10-17                0             0              0                0
44: 2021-10-24                0             0              0                0
          week Austin_architect Austin_doctor Austin_teacher Denver_architect
    Denver_doctor Denver_teacher Phoenix_architect Phoenix_doctor
 1:             1              1                 0              0
 2:             0              1                 0              0
 3:             0              1                 0              1
 4:             0              1                 0              1
 5:             0              1                 0              1
 6:             0              1                 0              1
 7:             0              1                 0              1
 8:             0              1                 0              1
 9:             0              1                 0              1
10:             0              1                 0              1
11:             0              1                 0              1
12:             0              1                 0              1
13:             0              1                 0              1
14:             0              1                 0              1
15:             0              1                 0              1
16:             0              1                 0              1
17:             0              1                 0              1
18:             0              1                 0              1
19:             0              1                 0              1
20:             0              1                 0              1
21:             0              1                 0              1
22:             0              1                 0              1
23:             0              1                 0              1
24:             0              1                 0              1
25:             0              1                 0              1
26:             0              1                 0              1
27:             0              1                 0              1
28:             0              1                 0              1
29:             0              1                 0              1
30:             0              1                 0              1
31:             0              1                 0              1
32:             0              1                 0              1
33:             0              1                 0              1
34:             0              1                 0              1
35:             0              1                 0              1
36:             0              1                 0              1
37:             0              1                 0              1
38:             0              1                 0              1
39:             0              1                 0              1
40:             0              1                 0              1
41:             0              1                 0              1
42:             0              1                 0              1
43:             0              1                 0              1
44:             0              1                 0              1
    Denver_doctor Denver_teacher Phoenix_architect Phoenix_doctor
    Phoenix_teacher Seattle_architect Seattle_doctor Seattle_teacher
 1:               0                 0              0               1
 2:               0                 0              0               1
 3:               0                 0              0               1
 4:               0                 0              0               1
 5:               0                 0              0               1
 6:               0                 0              0               1
 7:               0                 0              0               1
 8:               0                 0              0               1
 9:               0                 0              0               1
10:               0                 0              0               1
11:               0                 0              0               1
12:               0                 0              0               1
13:               0                 0              0               1
14:               0                 0              0               1
15:               0                 0              0               1
16:               0                 0              0               1
17:               0                 0              0               1
18:               0                 0              0               1
19:               0                 0              0               1
20:               0                 0              0               1
21:               0                 0              0               1
22:               0                 0              0               1
23:               0                 0              0               1
24:               0                 0              0               1
25:               0                 0              0               1
26:               0                 0              0               1
27:               0                 0              0               1
28:               0                 0              0               1
29:               0                 0              0               1
30:               0                 0              0               1
31:               0                 0              0               1
32:               0                 0              0               1
33:               0                 0              0               1
34:               0                 0              0               1
35:               0                 0              0               1
36:               0                 0              0               1
37:               0                 0              0               1
38:               0                 0              0               1
39:               0                 0              0               1
40:               0                 0              0               1
41:               0                 0              0               1
42:               0                 0              0               1
43:               0                 0              0               1
44:               0                 0              0               1
    Phoenix_teacher Seattle_architect Seattle_doctor Seattle_teacher