Home > Software engineering >  Passing column name to dplyr in function and for loop
Passing column name to dplyr in function and for loop

Time:08-17

I have 2 datasets, o1 and o1_logical and I try to filter/join/manipulate some of the columns using the code below. For each column specified in cols, I want to first filter the o1_logical, inner join with the o1, split the o1 columns, and print the nrow. I can do this without the for loop, specifying each column name manually one by one, but I want to do this efficiently, using functions and for loop:

cols = c('i', 'a', 'e', 'g','l', 'm', 's', 't') #some of the column names in 2 datasets

number_of_gs <- function(s) {
  o1_logical %>% 
    filter(s == 1) %>% 
    dplyr::select(name) %>% #name is another column in the datasets
    inner_join(o1) %>% 
    dplyr::select(s) %>%
    mutate(s = strsplit(as.character(s), ", ")) %>% 
    unnest(cols = c(s)) %>% 
    nrow() %>%
    print()
}

num_os <- function(s) {
  o1_logical %>% 
    filter(s == 1) %>% 
    dplyr::select(name) %>% 
    nrow()%>%
    print()
}


for (i in cols){
  num_of_gs(i)
  num_os(i)
}

I dont get any errors but the outputs are all 0. I assume I cannot parse the column names to dplyr. I went through previous posts and tried {{s}}, .[[s]] and using enquo() function but none worked. Can anyone help?

Thanks!

CodePudding user response:

The cause is probably the line filter(s == 1). Because when this is run s contains a string. So s == 1 is equivalent to 'i' == 1 which will return False for all rows.

Where you store a column name as a string, you need some additional code to tell R to use the contents of the string as the column name.

For example:

df %>%
  filter(my_col == 2)

# is not the same as

col_name = "my_col"
df %>% filter(col_name == 2)

The vignette Programming with dplyr contains several options for this. Here are a couple to consider.

# all of these are equivalent

# 1
df %>% filter(my_col == 2)

# 2
col_name = "my_col"
df %>% filter(!!sym(col_name) == 2)

# 3
col_name = "my_col"
df %>% filter(.data[[col_name]] == 2)

Very similar ideas apply with mutate. The select function tends to be more flexible, and can work with strings (though it may give warnings). For select it is recommended you use all_of. Details covering these can be found on the Programming with dplyr page.

  • Related