Home > Net >  R - How to use a function varaible to refernce an object for a left join in R
R - How to use a function varaible to refernce an object for a left join in R

Time:06-14

I'm trying to join two data sets in R using a left join in a function. Firstly I have my main data frame GE_GC referenced by the "df" in the function and I am trying to join a data frame called GE_GC_Teacher_Names however I would like the "GE_GC" part of the object name to be dynamic as I have multiple data sets with unique set of names that needs to be joined. For example if the "df" reference in my function was EX_EF then the function would join the EX_EF_Teacher_Names data frame onto the EX_EF data frame.


Q2_Table <- function(df){
  
  
  df %>% select(contains("Q2_")) %>% 
  gather(var,value) %>% 
  group_by(var) %>%
  summarise(
    Mean = round(mean(as.numeric(value), na.rm = TRUE), 2),
    Responses = length(value[!is.na(value)]),
    "Very Dissatisfied" = paste0(length(value[value == "1" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "1" & !is.na(value)])/Responses*100), ")"),
    "Dissatisfied" = paste0(length(value[value == "2" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "2" & !is.na(value)])/Responses*100), ")"),
    "Neutral" = paste0(length(value[value == "3" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "3" & !is.na(value)])/Responses*100), ")"),
    "Satisfied" = paste0(length(value[value == "4" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "4" & !is.na(value)])/Responses*100), ")"),
    "Very Satisfied" = paste0(length(value[value == "5" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "5" & !is.na(value)])/Responses*100), ")")
  ) %>%
    left_join(
      as.name(paste0((deparse(substitute(df))),"_Teacher_Names")), #Here works with imputed df name but trying to dynamically name teacher df
      by = 'var'
    ) %>%
      rename("Teacher" = teacher) %>%
      select(-var, -value) %>%
      relocate(Teacher, .before = Mean)
}

Q2_Output <- Q2_Table(GE_GC)

When trying to run this function I get the following error even though a matching column called "var" is present in the GE_GC and GE_GC_Teacher_Names data frames.

Error in auto_copy(): ! x and y must share the same src. i set copy = TRUE (may be slow). Run rlang::last_error() to see where the error occurred. >

The following code works fine when I input the teacher data frame name manually

Q2_Table <- function(df){
  
  
  df %>% select(contains("Q2_")) %>% 
  gather(var,value) %>% 
  group_by(var) %>%
  summarise(
    Mean = round(mean(as.numeric(value), na.rm = TRUE), 2),
    Responses = length(value[!is.na(value)]),
    "Very Dissatisfied" = paste0(length(value[value == "1" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "1" & !is.na(value)])/Responses*100), ")"),
    "Dissatisfied" = paste0(length(value[value == "2" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "2" & !is.na(value)])/Responses*100), ")"),
    "Neutral" = paste0(length(value[value == "3" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "3" & !is.na(value)])/Responses*100), ")"),
    "Satisfied" = paste0(length(value[value == "4" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "4" & !is.na(value)])/Responses*100), ")"),
    "Very Satisfied" = paste0(length(value[value == "5" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "5" & !is.na(value)])/Responses*100), ")")
  ) %>%
    left_join(
      GE_GC_Teacher_Names, #Here works with imputed df name but trying to dynamically name teacher df
      by = 'var'
    ) %>%
      rename("Teacher" = teacher) %>%
      select(-var, -value) %>%
      relocate(Teacher, .before = Mean)
}

Q2_Output <- Q2_Table(GE_GC)

So this section is the problem:

left_join(
      as.name(paste0((deparse(substitute(df))),"_Teacher_Names")), #Here works with imputed df name but trying to dynamically name teacher df
      by = 'var'
    )

Any help would be appreciated thank you.

CodePudding user response:

I suggest you simplify this a little by making the Teacher frame an argument to the function. This does two things:

  1. Simplifies your logic, where you are not relying so much on the name of an object (with the assumption of other existing variables); and
  2. Ensures the function is more functional, where its output is derived exclusively by the arguments passed to it, no inference, no guessing.
Q2_Table <- function(df, tchr) {
  df %>%
    select(contains("Q2_")) %>%
    pivot_longer(everything(), names_to = "var") %>%
    group_by(var) %>%
    summarise(
      Mean = round(mean(as.numeric(value), na.rm = TRUE), 2),
      Responses = length(value[!is.na(value)]),
      "Very Dissatisfied" = paste0(length(value[value == "1" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "1" & !is.na(value)])/Responses*100), ")"),
      "Dissatisfied" = paste0(length(value[value == "2" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "2" & !is.na(value)])/Responses*100), ")"),
      "Neutral" = paste0(length(value[value == "3" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "3" & !is.na(value)])/Responses*100), ")"),
      "Satisfied" = paste0(length(value[value == "4" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "4" & !is.na(value)])/Responses*100), ")"),
      "Very Satisfied" = paste0(length(value[value == "5" & !is.na(value)]), " (", sprintf("%1.0f%%", length(value[value == "5" & !is.na(value)])/Responses*100), ")")
    ) %>%
    left_join(tchr, by = 'var') %>%
    rename("Teacher" = teacher) %>%
    select(-var, -value) %>%
    relocate(Teacher, .before = Mean)
}

Q2_Table(GE_GC, GE_GC_Teacher_Names)
# # A tibble: 1 x 8
#   Teacher  Mean Responses `Very Dissatisfied` Dissatisfied Neutral Satisfied `Very Satisfied`
#     <int> <dbl>     <int> <chr>               <chr>        <chr>   <chr>     <chr>           
# 1       1   6.5         6 0 (0%)              0 (0%)       0 (0%)  0 (0%)    0 (0%)          

(Notice that I also shifted from gather to pivot_longer. While it adds nothing here, if you use it more and in more-complicated situations, using this newer function will pay off.)


Data

GE_GC <- structure(list(Q2_1 = c("6", "7", "6", "6", "7", "7")), row.names = c(NA, -6L), class = "data.frame")
GE_GC_Teacher_Names <- structure(list(var = c("Q2_1", "Q2_2", "Q2_3", "Q2_4"), value = c("Example", "Example", "Example", "Example"), teacher = 1:4), row.names = c(NA, -4L), class = "data.frame")
  •  Tags:  
  • r
  • Related