Home > Back-end >  How to use rbindlist(data) instead of do.call(rbind, data) in this case
How to use rbindlist(data) instead of do.call(rbind, data) in this case

Time:02-18

library(dplyr)
library(data.table)
library(stringr)

test = c('a1b1', 'a2b2', 'a3b3')
result = rbind(c(1,1),
               c(2,2),
               c(3,3))
result
     [,1] [,2]
[1,]    1    1
[2,]    2    2
[3,]    3    3
test2<-do.call(rbind,test %>% str_split('a'))
test3<-do.call(rbind,test2 %>% .[,2] %>% str_split('b'))
test3
     [,1] [,2]
[1,] "1"  "1" 
[2,] "2"  "2" 
[3,] "3"  "3" 
  1. do.call(rbind, data) is not equal rbindlist(data) ? data.table::rbindlist is not working. If I want to use rbindlist, what can I do?
rbindlist(test %>% str_split('a'))
Error in rbindlist(test %>% str_split("a")) : 
  Item 1 of input is not a data.frame, data.table or list

CodePudding user response:

If you use tstrsplit rather than str_split, they will be columns already rather than rows, so you can use as.data.table rather than rbinding them together.

test = c('a1b1', 'a2b2', 'a3b3')

library(data.table)
as.data.table(tstrsplit(tstrsplit(test, 'a')[[2]], 'b'))
#>        V1     V2
#>    <char> <char>
#> 1:      1      1
#> 2:      2      2
#> 3:      3      3

Created on 2022-02-17 by the reprex package (v2.0.1)

This will be much faster, e.g. < 1 second vs 18 seconds if the vector has 10,000 elements.

test = c('a1b1', 'a2b2', 'a3b3')

library(data.table)
library(stringr)
library(bench)

test <- sample(test, 1e5, TRUE)

mark(
tstrsplit = 
  as.data.table(tstrsplit(tstrsplit(test, 'a')[[2]], 'b'))
,
str_split = {
  test2 <- rbindlist(test %>% str_split("a") %>% lapply(., function(x)
  as.data.table(t(x))))
  
  rbindlist(as.matrix(test2) %>% .[,2] %>% str_split("b") %>% lapply(., function(x)
  as.data.table(t(x))))
}
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 tstrsplit   134.8ms  138.7ms    7.16      9.54MB     1.79
#> 2 str_split     18.8s    18.8s    0.0532    3.11GB     2.66

Created on 2022-02-17 by the reprex package (v2.0.1)

CodePudding user response:

If you want to use a similar approach using rbindlist, then you could do something like below. Essentially, you can add in a step to to turn each item in the list into a data.table (but need to transpose first).

library(dplyr)
library(data.table)
library(stringr)

test2 <- rbindlist(test %>% str_split("a") %>% lapply(., function(x)
  as.data.table(t(x))))

test3 <- rbindlist(as.matrix(test2) %>% .[,2] %>% str_split("b") %>% lapply(., function(x)
  as.data.table(t(x))))

Output

test3

   V1 V2
1:  1  1
2:  2  2
3:  3  3
  • Related