Say I have a list of lists where each sub-list is a move:
movies <- list(list("Jurassic Park", "Steven Spielberg", "Action"),
list("Avatar", "James Cameron", "Action"),
list("Schindler's List", "Steven Spielberg", "Biography")
)
What is the best/fastest way (preferably without dependencies, but tidyverse would be fine) to subset that list based on the sub-list elements? That is, if director is always the second element in the sub-list, what's the fastest way to get a vector of the names of movies that Spielberg directed?
Hoping to do this across very large lists many times.
Thanks in advance!!
CodePudding user response:
sapply(movies, `[[`, 2)
# [1] "Steven Spielberg" "James Cameron" "Steven Spielberg"
Benchmark: this answer is the fastest.
bench::mark(purrr = map_chr(movies, pluck, 2),
getElement = sapply(movies, getElement, 2),
`[[` = sapply(movies, `[[`, 2))
expression min median itr/s…¹ mem_a…² gc/se…³ n_itr n_gc
1 purrr 21.7µs 28.2µs 31773. 0B 6.36 9998 2
2 getElement 16.6µs 18.6µs 45652. 0B 4.57 9999 1
3 [[ 14.9µs 17.2µs 47417. 0B 4.74 9999 1
CodePudding user response:
Dependency free and readable:
sapply(movies, getElement, 2)
# [1] "Steven Spielberg" "James Cameron" "Steven Spielberg"
Fast but not readable and assumes each sublist is length 3:
unlist(movies)[-1L:(length(movies) * 3L-2L) %% 3L == 0L]
Benchmark with 100k sublists:
movies <- movies[sample(1:3, size = 100000, replace = TRUE)]
bench::mark(purrr = map_chr(movies, pluck, 2),
getElement = sapply(movies, getElement, 2),
`[[` = sapply(movies, `[[`, 2),
unlist(movies)[-1L:(length(movies) * 3L-2L) %% 3L == 0L])
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 purrr 230ms 233.58ms 4.19 781.3KB 14.0 3 10 715ms
2 getElement 71.9ms 77.07ms 12.8 3.29MB 14.7 7 8 545ms
3 [[ 27.8ms 29.35ms 32.4 3.29MB 9.53 17 5 525ms
4 unlist(movies)[-1L:(length(movies) * 3L - 2L)%%3L == 0L] 7.5ms 8.39ms 81.7 8.01MB 27.2 45 15 551ms
A small function out of the comment below movies that includes filtering:
return_movies <- function(list, title_position, comparison_position, comparison_string) {
sapply(movies, getElement, title_position)[
sapply(movies, getElement, comparison_position) == comparison_string
]
}
return_movies(movies, 1, 2, "Steven Spielberg")
[1] "Jurassic Park" "Schindler's List"
CodePudding user response:
library(purrr)
map_chr(movies, pluck, 2)
#> [1] "Steven Spielberg" "James Cameron" "Steven Spielberg"