Home > Software engineering >  using a loop to make a codebook for select tidy census variables through the years, trouble with cal
using a loop to make a codebook for select tidy census variables through the years, trouble with cal

Time:10-27

I'm not really great at loops but I'm trying to get better at working through them. I am using tidycensus to select and pull in a few variables throughout the year (dummy data in example below is representative). So, for a given set of selected variables (dv_acs), I want to pull the information in the comprehensive codebook that you can download through load_variables for every year and then full_join them. In most cases, this would be the same information throughout the years, but I want to have this complete so I can double check it and note any discrepancies.

Here is the setup, which is working:

library(tidycensus)
library(dplyr)


#getting codebook for all ACS years for every single variable possible
for(x in c(2009:2020)) {
  filename <- paste0("v", x)
  assign(filename, (load_variables(x, "acs5", cache = TRUE)))
}


#selecing and recoding variables to pull in
dv_acs = c(
  hus          = "B25002_001", 
  husocc       = "B25002_002", 
  husvac       = "B25002_003"
)

This is accomplishing what I want a year at a time, from which I could just do a full bind piece by piece

#creating a codebook a year at a time for variables I'm interested in
codebook <- v2009 %>%
  filter(name %in% dv_acs) %>%
  mutate(id = names(dv_acs), .before = 1)

colnames(codebook) = c("id", "name", "label_2009", "concept_2009")  

codebook2 <- v2010 %>%
  filter(name %in% dv_acs) %>%
  mutate(id = names(dv_acs), .before = 1)

colnames(codebook2) = c("id", "name", "label_2010", "concept_2010")  

codebook <- full_join(codebook, codebook2, by=c("id", "name"))

And here is where I try and fail to make a loop to create the codebook for my specific variables throughout the year all in one go:

#creating a loop to pull in an join a codebook for all years
for(x in c(2009:2010)){
    codebook <- data.frame(matrix(ncol = 2, nrow = 0)) #create a master file I can join the the files to as they load in through the loop
  colnames(codebook) <- c("id", "name") #giving right label names
  filename <- paste0("v", x) #this is where I'm starting to have trouble; this saves as a value, and I can't then use it to call the dataframe
  temp <- filename %>% (name %in% dv_acs) %>%
    mutate(id = names(dv_acs), .before = 1)
  colnames(temp) <- c("id", "name", paste0("label_", x), paste0("concept_", x))
  codebook <- full_join(codebook, temp, by=c("id", "name"))
}

Reported error is: "Error in name %in% dv_acs : object 'name' not found"

CodePudding user response:

It is better to not create objects in global environment. Instead, it could be stored in a list. Here, the values of the objects can be retrieved with mget

library(stringr)
library(purrr)
library(dplyr)
out <- mget(str_c("v", 2009:2020)) %>%
  imap(~ {
    nm <- str_c(c("label", "concept"), str_remove(.y, "v"))

    .x %>% 
   select(-any_of("geography")) %>%
   filter(name %in% dv_acs) %>%
   mutate(id = names(dv_acs), .before = 1) %>%
   rename_with(~ nm, c("label", "concept"))
   }) %>%
   reduce(full_join)

-output

> out
# A tibble: 3 × 26
  id    name  label…¹ conce…² label…³ conce…⁴ label…⁵ conce…⁶ label…⁷ conce…⁸ label…⁹ conce…˟ label…˟ conce…˟ label…˟ conce…˟ label…˟ conce…˟ label…˟ conce…˟ label…˟
  <chr> <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
1 hus   B250… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima…
2 huso… B250… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima…
3 husv… B250… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima…
# … with 5 more variables: concept2018 <chr>, label2019 <chr>, concept2019 <chr>, label2020 <chr>, concept2020 <chr>, and abbreviated variable names ¹​label2009,
#   ²​concept2009, ³​label2010, ⁴​concept2010, ⁵​label2011, ⁶​concept2011, ⁷​label2012, ⁸​concept2012, ⁹​label2013, ˟​concept2013, ˟​label2014, ˟​concept2014, ˟​label2015,
#   ˟​concept2015, ˟​label2016, ˟​concept2016, ˟​label2017, ˟​concept2017, ˟​label2018

If we want everything in the list without having to create objects in the global env

out <-  map(2009:2020, ~ {
          nm <- str_c(c("label", "concept"), "_", .x)
       load_variables(.x, "acs5") %>%
       select(-any_of("geography")) %>%
       filter(name %in% dv_acs) %>%
    mutate(id = names(dv_acs), .before = 1) %>%
    rename_with(~ nm, c("label", "concept"))
    }) %>%
    reduce(full_join)

-output

> out
# A tibble: 3 × 26
  id    name  label…¹ conce…² label…³ conce…⁴ label…⁵ conce…⁶ label…⁷ conce…⁸ label…⁹ conce…˟ label…˟ conce…˟ label…˟ conce…˟ label…˟ conce…˟ label…˟ conce…˟ label…˟
  <chr> <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
1 hus   B250… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima…
2 huso… B250… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima…
3 husv… B250… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima… OCCUPA… Estima…
# … with 5 more variables: concept_2018 <chr>, label_2019 <chr>, concept_2019 <chr>, label_2020 <chr>, concept_2020 <chr>, and abbreviated variable names
#   ¹​label_2009, ²​concept_2009, ³​label_2010, ⁴​concept_2010, ⁵​label_2011, ⁶​concept_2011, ⁷​label_2012, ⁸​concept_2012, ⁹​label_2013, ˟​concept_2013, ˟​label_2014,
#   ˟​concept_2014, ˟​label_2015, ˟​concept_2015, ˟​label_2016, ˟​concept_2016, ˟​label_2017, ˟​concept_2017, ˟​label_2018
# ℹ Use `colnames()` to see all variable names
  • Related