A simple approach of approximating the maximum likelihood of a model given some data is grid approximation. For example, in R, we can generate a grid of parameter values and then evaluate the likelihood of each value given some data (example from Statistical Rethinking by McElreath):
p_grid <- seq(from=0, to=1, length.out=1000)
likelihood <- dbinom(6, size=9, prob=p_grid)
Here, likelihood
is an array of 1000 values and I assume this is an efficient way to get such an array.
I am new to Julia (and not so good at R) so my approach of doing the same as above relies on comprehension syntax:
using Distributions
p_grid = collect(LinRange(0, 1, 1000))
likelihood = [pdf(Binomial(n9=, p=p), 6) for p in p_grid]
which is not only clunky but somehow seems inefficient because a new Binomial gets constructed 1000 times. Is there a better, perhaps vectorized, approach to accomplishing the same task?
CodePudding user response:
In languages like R or Python, people often use the term "vectorization" to mean "avoid for
loops in the language". I say "in the language" because there are still for
loops, it's just that they're now in C instead of R/Python.
In Julia, there's nothing to worry about. You'll still sometimes hear "vectorization", but Julia folks tend to use this in the original sense of hardware vectorization. More on that here.
As for your code, I think it's fine. To be sure, let's benchmark!
julia> using BenchmarkTools
julia> @btime [pdf(Binomial(9, p), 6) for p in $p_grid]
111.352 μs (1 allocation: 7.94 KiB)
Another way you could write this is using map
:
julia> @btime map($p_grid) do p
pdf(Binomial(9, p), 6)
end;
111.623 μs (1 allocation: 7.94 KiB)
To check for construction overhead, you could make lower-level calls to StatsFuns
, like this
julia> using StatsFuns
julia> @btime map($p_grid) do p
binompdf(9, p, 6)
end;
109.809 μs (1 allocation: 7.94 KiB)
It looks like there some difference, but it's pretty minor, maybe around 2% of the overall cost.