I'm adding types to some functions, namely for speed and clarity. I have a few questions about this. I'm asking them all here since I've seen other posts that do this. The questions are bolded for easier recognition.
There's a function that I'd like to add types to that counts the amount of negative values in a vector. I'd like to have it be compatible with Int and Float, just like an untyped function is. While convenient, Chris Rackauckas advises against it, see first comment of How to declare array type that can have int and floats ,
Do you want Ints and Floats at the same time? That's not advised due to performance reasons. But if you want to, you can use Union{Int64,Float64} as the type.
If a polymorphic function is necessary for some reason, then the better method appears to be to add types to the arguments by means of an Union
. For this particular case, however, with Number
and Real
being supertypes of Int64
and Float64
, I thought these would be slightly faster than the untyped version.
This, however, doesn't seem to be the case, at least on my simple tests. I tested a function without argument types, another with Any
, and two with types Number
and Real
, both supertypes of Float64
and Int64
.
notypes
median time: 1.006 ms (0.00% GC)
mean time: 1.152 ms (12.60% GC)
any
median time: 1.054 ms (0.00% GC)
mean time: 1.197 ms (12.05% GC)
union
median time: 1.062 ms (0.00% GC)
mean time: 1.210 ms (12.09% GC)
number
median time: 1.110 ms (0.00% GC)
mean time: 1.315 ms (14.18% GC)
real
median time: 1.015 ms (0.00% GC)
mean time: 1.168 ms (12.70% GC)
float
median time: 728.131 μs (0.00% GC)
mean time: 829.031 μs (11.53% GC)
Clearly, there is no significant improvement between the untyped function, and the ones typed as Any
and Number
. The Real
function appears to have a very slight improvement, and the Float64
function is the only with a noticeable improvement.
Is a function typed for only Int
and Float
generally faster than one without types? Union{Int64,Float64}
is supertype of only two types, unlike Number
and Real
that span many more. This is why I thought the Union
function would be faster than the rest (except the Float
version). Are Union
s as slow as untyped in general?
Is there any performance improvement in typing a function with supertypes such as Number
and Float
, compared to leaving them untyped?
Is it the best julian practice to define a function for every possible argument type? In this case, Int64
and Float64
.
For convenience's sake rather than performance's, is there a shorter way to define Int-Float functions? Perhaps, by defining a new type at the beginning? I tried doing this:
abstract type IF64 <: Union{Int64,Float64} end
but got an error
invalid subtyping in definition of IF64
Also, what does the N
mean in the function's definition? (such as where {T<:Union{Int64,Float64},N}
in countMPvec_arrUnion
's definition given below).
Lastly, why does the Float
function take less memory than the rest? (609.06 KiB
v. 921.56 KiB
) (This is the output of the benchmarks; code given below)
all other functions
memory estimate: 921.56 KiB
allocs estimate: 38979
float
memory estimate: 609.06 KiB
allocs estimate: 28979
Full code of the tests:
#= -------------------
The different versions of the function
-------------------
=#
function countMPvec_notypes(vec)
nneg = 0
lvec = length(vec)
for i in 1:lvec
if (vec[i] < 0)
nneg = 1
end
end
npos = lvec - nneg
return (nneg,npos)
end
function countMPvec_any(vec :: Any)
nneg = 0
lvec = length(vec)
for i in 1:lvec
if (vec[i] < 0)
nneg = 1
end
end
npos = lvec - nneg
return (nneg,npos)
end
# Works with Int and Float !!!
function countMPvec_arrUnion(vec :: Array{T,1}) where {T<:Union{Int64,Float64},N}
nneg = 0
lvec = length(vec)
for i in 1:lvec
if (vec[i] < 0)
nneg = 1
end
end
npos = lvec - nneg
return (nneg,npos)
end
# Works with Int and Float !!!
function countMPvec_arrNumber(vec :: Array{T,1}) where {T<:Number,N}
nneg = 0
lvec = length(vec)
for i in 1:lvec
if (vec[i] < 0)
nneg = 1
end
end
npos = lvec - nneg
return (nneg,npos)
end
# Works with Int and Float !!!
function countMPvec_arrReal(vec :: Array{T,1}) where {T<:Real,N}
nneg = 0
lvec = length(vec)
for i in 1:lvec
if (vec[i] < 0)
nneg = 1
end
end
npos = lvec - nneg
return (nneg,npos)
end
function countMPvec_arrFloat(vec :: Array{Float64,1})
nneg = 0
lvec = length(vec)
for i in 1:lvec
if (vec[i] < 0)
nneg = 1
end
end
npos = lvec - nneg
return (nneg,npos)
end
#= -------------------
Functions for benchmark
-------------------
=#
nums = [1.3; -2; 5]
nitertest = 10000
function test_notypes()
for i in 1:nitertest
countMPvec_notypes(nums)
end
end
function test_any()
for i in 1:nitertest
countMPvec_any(nums)
end
end
function test_arrUnion()
for i in 1:nitertest
countMPvec_arrUnion(nums)
end
end
function test_arrNumber()
for i in 1:nitertest
countMPvec_arrNumber(nums)
end
end
function test_arrReal()
for i in 1:nitertest
countMPvec_arrReal(nums)
end
end
function test_arrFloat()
for i in 1:nitertest
countMPvec_arrFloat(nums)
end
end
Then there's the benchmark cells. Each one of these should be run separately. Also, the full output from every cell is given at the end:
import BenchmarkTools
println("notypes")
@BenchmarkTools.benchmark test_notypes()
println("any")
@BenchmarkTools.benchmark test_any()
println("union")
@BenchmarkTools.benchmark test_arrUnion()
println("number")
@BenchmarkTools.benchmark test_arrNumber()
println("real")
@BenchmarkTools.benchmark test_arrReal()
println("float")
@BenchmarkTools.benchmark test_arrFloat()
#= -------------------
Output of the benchmarks
-------------------
=#
notypes
BenchmarkTools.Trial:
memory estimate: 921.56 KiB
allocs estimate: 38979
--------------
minimum time: 855.070 μs (0.00% GC)
median time: 1.006 ms (0.00% GC)
mean time: 1.152 ms (12.60% GC)
maximum time: 31.515 ms (91.59% GC)
--------------
samples: 4314
evals/sample: 1
any
BenchmarkTools.Trial:
memory estimate: 921.56 KiB
allocs estimate: 38979
--------------
minimum time: 905.317 μs (0.00% GC)
median time: 1.054 ms (0.00% GC)
mean time: 1.197 ms (12.05% GC)
maximum time: 30.355 ms (96.33% GC)
--------------
samples: 4152
evals/sample: 1
union
BenchmarkTools.Trial:
memory estimate: 921.56 KiB
allocs estimate: 38979
--------------
minimum time: 914.563 μs (0.00% GC)
median time: 1.062 ms (0.00% GC)
mean time: 1.210 ms (12.09% GC)
maximum time: 32.472 ms (90.09% GC)
--------------
samples: 4111
evals/sample: 1
number
BenchmarkTools.Trial:
memory estimate: 921.56 KiB
allocs estimate: 38979
--------------
minimum time: 926.189 μs (0.00% GC)
median time: 1.110 ms (0.00% GC)
mean time: 1.315 ms (14.18% GC)
maximum time: 42.545 ms (97.21% GC)
--------------
samples: 3788
evals/sample: 1
real
BenchmarkTools.Trial:
memory estimate: 921.56 KiB
allocs estimate: 38979
--------------
minimum time: 863.699 μs (0.00% GC)
median time: 1.015 ms (0.00% GC)
mean time: 1.168 ms (12.70% GC)
maximum time: 31.847 ms (96.50% GC)
--------------
samples: 4257
evals/sample: 1
float
BenchmarkTools.Trial:
memory estimate: 609.06 KiB
allocs estimate: 28979
--------------
minimum time: 625.845 μs (0.00% GC)
median time: 728.131 μs (0.00% GC)
mean time: 829.031 μs (11.53% GC)
maximum time: 30.811 ms (97.38% GC)
--------------
samples: 5989
evals/sample: 1
CodePudding user response:
There's a single answer to all your performance-related questions: type-annotation improves performance for structures but generally not for functions. The post you read from Chris concerned the element type of an array (a structure), even though it appeared in terms of what types a function would accept. You should interpret it in that context.
In Julia you should type-annotate function arguments to control dispatch, not for performance. Why? Because the compiler will specialize each function for every concrete type you pass into it, in a sense automatically generating all those specializations you were thinking about writing out by hand. So the only reason to add a type annotation is to control which method gets called:
is_this_a_number(x::Number) = true
is_this_a_number(x) = false
Structures are a different story. You can read more about type-annotation in structures in the Performance tips section of the manual.
Finally, keep in mind that some things have changed since those early days of that question. In particular, union splitting allows good performance in many circumstances with small Union
s of concrete types (check each with isconcrete
if you're unsure).
what does the N mean in the function's definition
The number of dimensions in the array.