I trained a neural network with approximately 26000 parameters, Im intending to use it on mobile phones for real time inference. Im wondering if there is a way to estimate the run time of a neural network given the size of the network and the operating device.
CodePudding user response:
You need to estimate the number of floating point ops (FLOPS) that running your model requires. For example, multiplying two N x N
matrices counts as 2 N^3
FLOPS. (There are software packages that help you do that in PyTorch) 1 multiply-add counts as 2 FLOPS, by the way.
Then, you need to know the capabilities of you target device. How many floating point ops can it do per second? This provides an upper bound on how fast your code can run. Will your code reach this theoretical limit? That is unclear, but this gives you something to shoot for. The simpler the problem (small tensors), the more likely you are to fall behind this limit.
If you quantize your models for deployment, you need to adjust your calculation accordingly (use the speed of relevant integer operations).