I have written a python program and run it both with pypy and python i have inserted some timing prints to measure the performance difference. in some cases the speedup is 10X, in other there is no change. Can anyone explain if there are rules to follow when writing a program in order to exploit the potential speedup given by pypy? E.G. avoid some syntax, prefer some data strcutures vs others....
I found a speedup between 12x and 1,5x
CodePudding user response:
PyPy tends to be fast in pure numerically-intensive codes with hot loops dealing with (small) integers/float number as it can directly use native types instead of variable-sized dynamic integer/float objects. It will still be slower than a natively-compiled C/C code because it needs to check the types at runtime and compile the code at runtime.
PyPy does not like (large) dynamic codes. It uses a tracing just-in-time compiler that can track which part of the code are more likely to be executed and compile this path dynamically at runtime if it is executed often. When there are many path executed changing dynamically, the overhead of the JIT can be significant, and in the worst case, PyPy can choose not to compile any path. The thing is the PyPy fallback interpreter is slower than the one of CPython (due to the ability to trace and compile the code at runtime). Some dynamic features like frame introspection is supported but it is slow (since a code is not expected to use it massively).
PyPy is not fast for short-running scripts since the JIT has to compile the code at runtime and the overhead of the JIT or using the fallback interpreter (slower than CPython) can be higher than just interpreting the code with CPython for such scripts.
PyPy use a Garbage Collector (GC) as opposed to CPython which use Automatic Reference Counting (ARC). GCs can be faster to allocate/free many objects (especially small temporary objects), but they needs to track the object alive to know which one are dead and then free them. This means codes dealing with a huge amount of references and regular object allocations can actually be slower. This includes dynamic graph-based data structures and trees for example.
The C binding APIs (C extension and CTypes but not CFFI) tends to be slower than CPython (mainly because it has been designed for CPython in the first place). This means glue codes calling a lot of wrapped C function will actually be slower with PyPy. A lot of work has been done recently to significantly improve the performance of PyPy in this case, but AFAIK PyPy is still slower. An example of use case is operation on large Numpy arrays (for small ones, embedded JITs like Numba are certainly better), as well as the CSV and pickling packages.
String operations tends to be often slower, and especially string concatenation. One reason is that CPython use efficient algorithms for string operations that are pretty-well optimized and written in C at the expense of a large and complex code base. This is a significant work for the small PyPy teams which needs to reimplement this and maintain it with the additional complexity of the JIT and the GC. As a result, operations can be less well optimized. Regarding the concatenation, the inefficiency comes from the JIT which cannot optimize out intermediate copies. That being said, note that string-appending loops should be avoided anyway.
Generators tends to be slower than simple basic loops. The simpler, the better. One should not expect the JIT to perform complex expensive optimizations at runtime since the overhead of the JIT should not be too big compared to the rest of the code (and PyPy do not know the time taken by the code ahead of time).
Global variables are slow in CPython but not in PyPy. That being said, they should not be used for software engineering reasons anyway.
This is a pretty broad topic. There are many other interesting point to consider to evaluate the performance of a given Python code. For more information, please read:
- https://www.pypy.org/performance.html (performance tips)
- https://speed.pypy.org (benchmark)
- https://www.pypy.org/blog (articles about the development of PyPy)
CodePudding user response:
There are no hard and fast rules. There are some hints in the PyPy FAQ https://doc.pypy.org/en/latest/faq.html#how-fast-is-pypy. PyPy can JIT hot python code, and may be able to make lists, dictionaries, and tuples that store one type of object (int, float, string) more efficient.