Home > Back-end >  Join and group_by tidy eval issue
Join and group_by tidy eval issue

Time:10-13

I have the following function that I have put together. It works up until the last part (noted in a comment in the code) where it has to join the objects together. I don't know how to get it to work. I believe my main problem has to do with converting the colName argument into a string for the "by =" argument of the joiner function. In relation to the group_by function, I'm not sure if what I have put there in the curly brackets will work. If anyone could help that would be great!

   emp_turnover_fun <- function(data, colName, year = "2015") {
  
  # Convert colName to symbol or check if symbol
  colName <- ensym(colName)
  
  # Terminations by year and variable in df
  term_test <- data %>%
    filter(year(DateofTermination) == year) %>%
    count(!!(colName)) %>%
    clean_names()
  
  # Start employees by var and year
  fun_year_job <- paste(year, "-01-01", sep = "")
  start_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year_job,
      DateofTermination > fun_year_job | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  
  # End employees by year and var
  year_pos <- year %>% as.character()
  year_num_plus_pos <- as.character(as.numeric(year_pos)   1)
  fun_year2_pos <- paste(year_num_plus_pos, "-01-01", sep = "")
  
  end_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year2_pos,
      DateofTermination > fun_year2_pos | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  #### PROBLEM BEGINS HERE
  join_turnover_year <- full_join(start_test, end_test, by = str(colName)) %>%
    full_join(y = term_test, by = str(colName)) %>%
    setNames(c(str(colName), "Start_Headcount", "End_Headcount", "Terminations")) %>%
    group_by({{colName}}) %>%
    summarise(Turnover = ((Terminations) / (Start_Headcount   End_Headcount)) * 100)
  
  return(join_turnover_year)
}

CodePudding user response:

The issue is using str which gets the structure of an object. Assuming that colName is passed as a string, we don't need any wrapping. Inside the function it is converted to symbol with ensym. So, either get the input (assume it is a string) before converting to symbol as a different object or make use of as_string from rlang

 emp_turnover_fun <- function(data, colName, year = "2015") {
  
  # Convert colName to symbol or check if symbol
  colName <- ensym(colName)
  colName_str <- rlang::as_string(colName) ## converted to string

  
  # Terminations by year and variable in df
  term_test <- data %>%
    filter(year(DateofTermination) == year) %>%
    count(!!(colName)) %>%
    clean_names()
  
  # Start employees by var and year
  fun_year_job <- paste(year, "-01-01", sep = "")
  start_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year_job,
      DateofTermination > fun_year_job | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  
  # End employees by year and var
  year_pos <- year %>% as.character()
  year_num_plus_pos <- as.character(as.numeric(year_pos)   1)
  fun_year2_pos <- paste(year_num_plus_pos, "-01-01", sep = "")
  
  end_test <- data %>%
    select(DateofHire, DateofTermination, !!(colName)) %>%
    filter(
      DateofHire <= fun_year2_pos,
      DateofTermination > fun_year2_pos | is.na(DateofTermination)
    ) %>%
    count(!!(colName))
  
  join_turnover_year <- full_join(start_test, end_test, 
             by = colName_str) %>% # use the string
    full_join(y = term_test, by = colName_str) %>% # use the string
    setNames(c(colName_str, "Start_Headcount", "End_Headcount", 
             "Terminations")) %>% # here as well
    group_by({{colName}}) %>%
    summarise(Turnover = ((Terminations) / (Start_Headcount   End_Headcount)) * 100)
  
  return(join_turnover_year)
}

It is safer to do as_string as opposed to taking the input directly as string i.e. ensym can work with both unquoted or quoted values, thus if we are passing unquoted, then grabbing the input doesn't work i.e. it may need deparse(substitute(colName)). Instead, first convert to symbol and then do the conversion back to string with as_string

  • Related