I am creating a codebook for tidycensus variables I am pulling for all ACS5 years from 2009-2020. To make sure or to flag any differences in variables across years, I am trying to put a check column in at the end. However, there seems to be some change in the naming center at 2018, which I'd like to ignore in favor of identifying real problems.

#selecing and recoding variables to pull in
dv_acs = c(
  same1          = "B25002_001", 
  same2       = "B25002_002", 
  diff1       = "C24010_039"

#creating a loop to pull in an join a codebook for all years
out <-  map(2009:2020, ~ {
  nm <- str_c(c("label", "concept"), "_", .x)
  load_variables(.x, "acs5") %>%
    select(-any_of("geography")) %>%
    filter(name %in% dv_acs) %>%
    mutate(id = names(dv_acs), .before = 1) %>%
    rename_with(~ nm, c("label", "concept"))
}) %>%

#putting in checks
out <- out %>% 
  rowwise %>% 
  mutate(label_flag = n_distinct(unlist(across(starts_with('label'), 
                                               ~ as.character(.x)))) == 1) %>%

Okay, from above, the first two variables (same1, same2), would get a TRUE value in the label_flag column if it worked how I want it to, but because there's a ":" introduced into the string in later years, it comes up false. For comparison, diff1 has a truly different value between the 2009 and later labels (it goes from "Estimate!!Total!Female" to "Estimate!!Total:!!Female:!!Management, business, science, and arts occupations:", this should show up as FALSE in the label_flag column.

I don't know if I should introduce something using grepl or put in a string dist somehow, and would appreciate any solutions ya got.

We may use pmap to loop over the rows of selected columns that starts with 'label', remove all the : and then get the n_distinct to check for only single unique value

out <- out %>%
    mutate(label_flag = pmap_int(across(starts_with('label')),
       ~ n_distinct(str_remove_all(c(...), ":")) == 1))
