Home > Back-end >  replicate `expand.grid()` behavior with data.frames using tidyr/data.table
replicate `expand.grid()` behavior with data.frames using tidyr/data.table

Time:06-04

I am trying to speed up the base::expand.grid() function. I came across this amazing answer How to speed up `expand.grid()` in R?. However, the behavior I need relies on a data.frame passed to the base::expand.grid() function, but unfortunately, the suggested (faster) functions have slightly different behavior when receiving data.frames. For instance, this is the behavior I need.

x  <- c(.3,.6)
df <- as.data.frame(rbind(x, 1 - x))
df
##   V1  V2
## x 0.3 0.6
##   0.7 0.4
 
(base::expand.grid(df))
##   V1  V2
## 1 0.3 0.6
## 2 0.7 0.6
## 3 0.3 0.4
## 4 0.7 0.4

However, this is what I am getting out of faster functions:

library(tidyr)
library(data.table)
(tidyr::expand_grid(df))
## # A tibble: 2 × 2
##       V1    V2
## <dbl> <dbl>
##   1   0.3   0.6
##   2   0.7   0.4
##  
(tidyr::crossing(df))
# A tibble: 2 × 2
##       V1    V2
## <dbl> <dbl>
##   1   0.3   0.6
##   2   0.7   0.4

(as_tibble(data.table::CJ(df,sorted = FALSE)))
## # A tibble: 2 × 1
##       df$``   $``
## <dbl> <dbl>
##   1   0.3   0.6
##   2   0.7   0.4

Do you know how I could tweak said functions to resemble the base::expand.grid() when it received a data.frame, of course, without losing the gains in performance?

Thank you in advance!


BTW: I am already aware of the existence of:

CodePudding user response:

Try with do.call

> do.call(tidyr::expand_grid, df)
# A tibble: 4 x 2
     V1    V2
  <dbl> <dbl>
1   0.3   0.6
2   0.3   0.4
3   0.7   0.6
4   0.7   0.4

> do.call(tidyr::crossing, df)
# A tibble: 4 x 2
     V1    V2
  <dbl> <dbl>
1   0.3   0.4
2   0.3   0.6
3   0.7   0.4
4   0.7   0.6

> do.call(data.table::CJ, df)
    V1  V2
1: 0.3 0.4
2: 0.3 0.6
3: 0.7 0.4
4: 0.7 0.6

CodePudding user response:

Try tidyr::expand()

tidyr::expand(df,df[,1],df[,2])
  • Related