I am confused about how much time it takes python to initialize a new array in Python.
Let's say I wanted to create an array of length n and eventually populate it with values.
If I already have an array of length n somewhere else in the program, would it be faster to just copy the array by doing: newArray = oldArray
or would it be more efficient to create a new array by doing: newArray = [0 for _ in range(len(oldArray))]
I have also tried newArray = [0] * len(oldArray)
but that seems to be the slowest option.
As a follow up, if some of the values in newArray needed to be the same as the old array (3 values? 100 values? more?) how does that change the answer? Would using a different data structure lower the creation time? How does appending to the array fit into all of this?
All help is appreciated!
CodePudding user response:
For quick benchmarking I've profiled proposed solutions with lineprofiler (profileline is a decorator to wrap line profiling on a function)
import numpy as np
from scripts.profilestats import profileline
@profileline()
def main(n):
old_array = list(range(n))
l1 = old_array.copy()
l2 = [0 for _ in range(n)]
l3 = [0] * n
n1 = np.zeros(n)
if __name__ == '__main__':
main(100000000)
and we get this result :
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profileline()
5 def main(n):
6 1 1414275.0 1414275.0 13.2 old_array = list(range(n))
7 1 374860.0 374860.0 3.5 l1 = old_array.copy()
8 1 8747201.0 8747201.0 81.8 l2 = [0 for _ in range(n)]
9 1 160812.0 160812.0 1.5 l3 = [0] * n
10 1 30.0 30.0 0.0 n1 = np.zeros(n)
Unsurprisingly np.zeros is the quickest solution.