I'd like to group_by
across
several variables in dtplyr
within a lapply
loop, and I find that I somehow can't use the same syntax as dplyr
after calling lazy_dt()
.
library(dplyr)
mycolumns= c("Wind", "Month", "Ozone", "Solar.R")
columnpairs <- as.data.frame(combn(mycolumns, 2))
# V1 V2 V3 V4 V5 V6
# 1 Wind Wind Wind Month Month Ozone
# 2 Month Ozone Solar.R Ozone Solar.R Solar.R
result_dplyr <- lapply(columnpairs, function(x) {
airquality %>%
select(all_of(x)) %>%
group_by(across(all_of(x))) %>% filter(n() > 1)
}
)
$V1
# A tibble: 105 x 2
# Groups: Wind, Month [40]
Wind Month
<dbl> <int>
1 7.4 5
2 8 5
3 11.5 5
4 14.9 5
5 8.6 5
6 8.6 5
7 9.7 5
8 11.5 5
9 12 5
10 11.5 5
# ... with 95 more rows
Using the same syntax, I encounter an issue after calling lazy_dt
with dtplyr
.
library(dtplyr)
airq <- lazy_dt(airquality)
lapply(columnpairs, function(x) {
airq %>% select(all_of(x)) %>%
group_by(across(all_of(x))) %>% filter(n() > 1)
})
Error in `all_of()`:
! object 'x' not found
Any idea?
EDIT: issue created at https://github.com/tidyverse/dtplyr/issues/383
CodePudding user response:
It seems that the method for group_by
with dtplyr
(group_by.dtplyr_step
) is creating the issue.
> methods('group_by')
[1] group_by.data.frame* group_by.data.table* group_by.dtplyr_step*
Not sure if it is a bug or not.
> traceback()
...
6: group_by.dtplyr_step(., across(all_of(.x))) ###
5: group_by(., across(all_of(.x)))
4: filter(., n() > 1)
3: airq %>% select(all_of(.x)) %>% group_by(across(all_of(.x))) %>%
filter(n() > 1)
2: .f(.x[[i]], ...)
1: map(columnpairs, ~airq %>% select(all_of(.x)) %>% group_by(across(all_of(.x))) %>%
filter(n() > 1))
Here are two methods that are working
- Using the deprecated
group_by_at
- Converting to
syms
and then evaluate (!!!
)
Using group_by_at
library(dtplyr)
library(purrr)
library(dplyr)
map(columnpairs, ~ airq %>%
select(all_of(.x)) %>%
group_by_at(all_of(.x)) %>%
filter(n() > 1))
$V1
Source: local data table [105 x 2]
Groups: Wind, Month
Call:
_DT2 <- `_DT1`[, .(Wind, Month)]
`_DT2`[`_DT2`[, .I[.N > 1], by = .(Wind, Month)]$V1]
Wind Month
<dbl> <int>
1 7.4 5
2 7.4 5
3 8 5
4 8 5
5 11.5 5
6 11.5 5
# … with 99 more rows
...
Converting to symbols and evaluate
map(columnpairs, ~ airq %>%
select(all_of(.x)) %>%
group_by(!!! rlang::syms(.x)) %>%
filter(n() > 1))
$V1
Source: local data table [105 x 2]
Groups: Wind, Month
Call:
_DT20 <- `_DT1`[, .(Wind, Month)]
`_DT20`[`_DT20`[, .I[.N > 1], by = .(Wind, Month)]$V1]
Wind Month
<dbl> <int>
1 7.4 5
2 7.4 5
3 8 5
4 8 5
5 11.5 5
6 11.5 5
# … with 99 more rows
# Use as.data.table()/as.data.frame()/as_tibble() to access results
$V2
...