Home > Mobile >  Fastest way to subset sub-elements of lists in R
Fastest way to subset sub-elements of lists in R

Time:10-27

Say I have a list of lists where each sub-list is a move:

movies <- list(list("Jurassic Park", "Steven Spielberg", "Action"),
               list("Avatar", "James Cameron", "Action"),
               list("Schindler's List", "Steven Spielberg", "Biography")
           )

What is the best/fastest way (preferably without dependencies, but tidyverse would be fine) to subset that list based on the sub-list elements? That is, if director is always the second element in the sub-list, what's the fastest way to get a vector of the names of movies that Spielberg directed?

Hoping to do this across very large lists many times.

Thanks in advance!!

CodePudding user response:

sapply(movies, `[[`, 2)
# [1] "Steven Spielberg" "James Cameron"    "Steven Spielberg"

Benchmark: this answer is the fastest.

bench::mark(purrr = map_chr(movies, pluck, 2), 
            getElement = sapply(movies, getElement, 2),
            `[[` = sapply(movies, `[[`, 2))

  expression      min  median itr/s…¹ mem_a…² gc/se…³ n_itr  n_gc
1 purrr        21.7µs  28.2µs  31773.      0B    6.36  9998     2
2 getElement   16.6µs  18.6µs  45652.      0B    4.57  9999     1
3 [[           14.9µs  17.2µs  47417.      0B    4.74  9999     1

CodePudding user response:

Dependency free and readable:

sapply(movies, getElement, 2)
# [1] "Steven Spielberg" "James Cameron"    "Steven Spielberg"

Fast but not readable and assumes each sublist is length 3:

unlist(movies)[-1L:(length(movies) * 3L-2L) %% 3L == 0L]

Benchmark with 100k sublists:

movies <- movies[sample(1:3, size = 100000, replace = TRUE)]
bench::mark(purrr = map_chr(movies, pluck, 2), 
            getElement = sapply(movies, getElement, 2),
            `[[` = sapply(movies, `[[`, 2),
            unlist(movies)[-1L:(length(movies) * 3L-2L) %% 3L == 0L])

  expression                                                    min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time 
  <bch:expr>                                               <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> 
1 purrr                                                       230ms 233.58ms      4.19   781.3KB    14.0      3    10      715ms
2 getElement                                                 71.9ms  77.07ms     12.8     3.29MB    14.7      7     8      545ms
3 [[                                                         27.8ms  29.35ms     32.4     3.29MB     9.53    17     5      525ms
4 unlist(movies)[-1L:(length(movies) * 3L - 2L)%%3L == 0L]    7.5ms   8.39ms     81.7     8.01MB    27.2     45    15      551ms

A small function out of the comment below movies that includes filtering:

return_movies <- function(list, title_position, comparison_position, comparison_string) {
  sapply(movies, getElement, title_position)[
        sapply(movies, getElement, comparison_position) == comparison_string
         ]
}

return_movies(movies, 1, 2, "Steven Spielberg")

[1] "Jurassic Park"    "Schindler's List"

CodePudding user response:

library(purrr)

map_chr(movies, pluck, 2)
#> [1] "Steven Spielberg" "James Cameron"    "Steven Spielberg"
  • Related