I have 2 datasets, o1 and o1_logical and I try to filter/join/manipulate some of the columns using the code below. For each column specified in cols
, I want to first filter the o1_logical, inner join with the o1, split the o1 columns, and print the nrow. I can do this without the for loop, specifying each column name manually one by one, but I want to do this efficiently, using functions and for loop:
cols = c('i', 'a', 'e', 'g','l', 'm', 's', 't') #some of the column names in 2 datasets
number_of_gs <- function(s) {
o1_logical %>%
filter(s == 1) %>%
dplyr::select(name) %>% #name is another column in the datasets
inner_join(o1) %>%
dplyr::select(s) %>%
mutate(s = strsplit(as.character(s), ", ")) %>%
unnest(cols = c(s)) %>%
nrow() %>%
print()
}
num_os <- function(s) {
o1_logical %>%
filter(s == 1) %>%
dplyr::select(name) %>%
nrow()%>%
print()
}
for (i in cols){
num_of_gs(i)
num_os(i)
}
I dont get any errors but the outputs are all 0. I assume I cannot parse the column names to dplyr. I went through previous posts and tried {{s}}, .[[s]] and using enquo() function but none worked. Can anyone help?
Thanks!
CodePudding user response:
The cause is probably the line filter(s == 1)
. Because when this is run s
contains a string. So s == 1
is equivalent to 'i' == 1
which will return False for all rows.
Where you store a column name as a string, you need some additional code to tell R to use the contents of the string as the column name.
For example:
df %>%
filter(my_col == 2)
# is not the same as
col_name = "my_col"
df %>% filter(col_name == 2)
The vignette Programming with dplyr contains several options for this. Here are a couple to consider.
# all of these are equivalent
# 1
df %>% filter(my_col == 2)
# 2
col_name = "my_col"
df %>% filter(!!sym(col_name) == 2)
# 3
col_name = "my_col"
df %>% filter(.data[[col_name]] == 2)
Very similar ideas apply with mutate
. The select
function tends to be more flexible, and can work with strings (though it may give warnings). For select
it is recommended you use all_of
. Details covering these can be found on the Programming with dplyr page.