Execute these lines :
import numpy as np
import time
start = time.time()
t = []
for i in range(int(1e5)):
t.append(i)
t = np.array(t)
end = time.time()
print(end-start)
And compare with these :
import numpy as np
import time
start = time.time()
t = np.array([])
for i in range(int(1e5)):
np.append(t,[i])
end = time.time()
print(end-start)
The first is faster than the second by approximatively a factor 100 !
What is the reason ?
CodePudding user response:
Python lists hold references to objects. These references are contiguous in memory, but python allocates its reference array in chunks, so only some appends require a copy. Numpy does not preallocate extra space, so the copy happens every time. And since all of the columns need to maintain the same length, they are all copied on each append.