I installed pyinstaller in the hopes that i could improve my python codes performance by transforming all my python scripts into binary. To test this, i made a simple script named test_compiler.py:
#!/usr/bin/python3
import time
start_time = time.time()
c = 0
while c<10000000:
c=c 1
print(time.time()-start_time)
Executing this script usually printed a float between 0.49 and 0.55. Then in the folder in which i had this script i did:
pyinstaller test_compiler.py
which generated a few files and folders. I did
cd ./dist/test_compiler/
./test_compiler
And doing this always prints a float between 0.6 and 0.65. I know this question probably sounds naive to someone with more knowledge in the matter, but shouldnt turning my code into binary improve its performance? Did i completely misinterpret pyinstaller's objectives thinking they involved performance but its actually more like privacy and deployment?
More important: is there any similar approach that could help me with performance?
CodePudding user response:
shouldnt turning my code into binary improve its performance?
No. At least not with Pyinstaller. The goal of this tool is to "bundles a Python application and all its dependencies into a single package" regarding its documentation. What it basically do is to wrap a Python interpreter in a binary package with the source and its dependencies. The code is not compiled (by default).
Did i completely misinterpret pyinstaller's objectives thinking they involved performance but its actually more like privacy and deployment?
Yes. The documentation do not mention the code is compiled or can be executed faster. It states the user do not need the interpreter, but this is because it is included in the executable bundle.
More important: is there any similar approach that could help me with performance?
If your goal is to get a faster performance, Cython can help and it is supported by Pyinstaller. That being said, you need to put (typing) annotations in the code to really make it faster (otherwise the performance gain will be small). Whether the annotated code is still a Python code is debatable though.
JIT-based interpreters like PyPy can execute this code significantly faster but it does not support all modules (packages like Numpy are supported but using them is tedious with PyPy and the performance may not be the same). On reason is that it exhibit a slightly different behavior (eg. it uses a GC that change the deletion behavior of objects sometime requiring a modified code). Pysthon tried a similar goal but AFAIK with a better support but it is not as fast as PyPy (also because it uses a less ambitious method).
Embedded JITs like Numba can achieve the performance of annotated Cython codes (in fact, Numba can be faster for Numpy codes) and faster performance than PyPy, but Numba mainly target Numpy-based codes and do not support generic Python codes (eg. packages like time
, pandas
or even scipy
).
And doing this always prints a float between 0.6 and 0.65
The difference of performance is likely to come from a different environment and especially the number of global variables (and their hashes). Generally, You can get a slightly faster code by putting this code into a function so not to do global fetches (in both versions).