Why is operating on Float64 faster than Float16?-CodePudding

I wonder why operating on Float64 values is faster than operating on Float16:

julia> rnd64 = rand(Float64, 1000);

julia> rnd16 = rand(Float16, 1000);

julia> @benchmark rnd64.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.800 μs … 662.140 μs  ┊ GC (min … max):  0.00% … 99.37%
 Time  (median):     2.180 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   3.457 μs ±  13.176 μs  ┊ GC (mean ± σ):  12.34% ±  3.89%

  ▁██▄▂▂▆▆▄▂▁ ▂▆▄▁                                     ▂▂▂▁   ▂
  ████████████████▇▇▆▆▇▆▅▇██▆▆▅▅▆▄▄▁▁▃▃▁▁▄▁▃▄▁▃▁▄▃▁▁▆▇██████▇ █
  1.8 μs       Histogram: log(frequency) by time      10.6 μs <

 Memory estimate: 8.02 KiB, allocs estimate: 5.

julia> @benchmark rnd16.^2
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.117 μs … 587.133 μs  ┊ GC (min … max): 0.00% … 98.61%
 Time  (median):     5.383 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.716 μs ±   9.987 μs  ┊ GC (mean ± σ):  3.01% ±  1.71%

    ▃▅█▇▅▄▄▆▇▅▄▁             ▁                                ▂
  ▄██████████████▇▆▇▆▆▇▆▇▅█▇████▇█▇▇▆▅▆▄▇▇▆█▇██▇█▇▇▇▆▇▇▆▆▆▆▄▄ █
  5.12 μs      Histogram: log(frequency) by time      7.48 μs <

 Memory estimate: 2.14 KiB, allocs estimate: 5.

Maybe you ask why I expect the opposite: Because Float16 values have less floating point precision:

julia> rnd16[1]
Float16(0.627)

julia> rnd64[1]
0.4375452455597999

Shouldn't calculations with fewer precisions take place faster? Then I wonder why someone should use Float16? They can do it even with Float128!

CodePudding user response：

As you can see, the effect you are expecting is present for Float32:

julia> rnd64 = rand(Float64, 1000);

julia> rnd32 = rand(Float32, 1000);

julia> rnd16 = rand(Float16, 1000);

julia> @btime $rnd64.^2;
  616.495 ns (1 allocation: 7.94 KiB)

julia> @btime $rnd32.^2;
  330.769 ns (1 allocation: 4.06 KiB)  # faster!!

julia> @btime $rnd16.^2;
  2.067 μs (1 allocation: 2.06 KiB)  # slower!!

Float64 and Float32 have hardware support on most platforms, but Float16 does not, and must therefore be implemented in software.

Note also that you should use variable interpolation ($) when micro-benchmarking. The difference is significant here, not least in terms of allocations:

julia> @btime $rnd32.^2;
  336.187 ns (1 allocation: 4.06 KiB)

julia> @btime rnd32.^2;
  930.000 ns (5 allocations: 4.14 KiB)

CodePudding user response：

The short answer is that you probably shouldn't use Float16 unless you are using a GPU or an Apple CPU because (as of 2022) other processors don't have hardware support for Float16.