Home > Mobile >  Saving data after loop with numpy
Saving data after loop with numpy

Time:06-14

My code looks like this at the moment:

new_table = np.zeros(shape=(4,1),dtype=object) 

for i in y:   
    some calculation that produce result
    new_table = np.append(new_table, np.array([result]), axis=0) 

After printing new_table result look like this:

array([[0],
       [0],
       [0],
       [0],
       [(1, 61.087293, 33.429379, 0.42581059018640416)],
       [(1, 61.087293, 33.429379, 0.3203261022508016)],
       [(1, 61.087293, 33.429379, 0.45689267865065536)]], dtype=object)

But output should be without those 4 zeros at the beginning of the array:

I am not sure what I am doing wrong, and is there possibility to add the column names to new_table and how to do this?

Thanks.

CodePudding user response:

The problem is that you generate the (4,1) array and then append more rows to it, i.e. you just add more rows. Either you start with an empty table (np.array([])) and append to that, or you change the values in the table in place.

CodePudding user response:

Start with an empty array of the required shape. If your data is rows:

new_table = np.empty((0, 4)) 
for i in y:   
    ...
    new_table = np.append(new_table, np.array([result]), axis=0) 

Keep in mind that this keeps reallocating the entire array over and over, which is very inefficient. You're much better off skipping the initial array, accumulating the snippets in a list, and stacking it later:

table_list = []
for ...:
    table_list.append(result)
new_table = np.stack(table_list, axis=0)

CodePudding user response:

If you are working with large data sets, it might make more sense to preallocate the array and then set the values as opposed to append to a growing array / list. I compared @Mad Physicist 's solution to a different approach.

import timeit
import numpy as np

y = np.random.randint(0, 100, 10000)    # dummy data

starttime1 = timeit.default_timer()
new_table = np.zeros((len(y), 4))

for idx, i in enumerate(y):
    # ... some dummy operation
    new_table[idx] = (i, i**2, i**3, i**4)

print(f"Preallocating : {timeit.default_timer() - starttime1} s")

table_list = []
starttime2 = timeit.default_timer()

for i in y:
    table_list.append((i, i**2, i**3, i**4))
new_table = np.stack(table_list, axis=0)

print(f"np.stack : {timeit.default_timer() - starttime2} s")

It seems that the first way outperforms the second one. I didn't benchmark this properly, but I assume that the time saved is even more signifficant for larger data / arrays.

Preallocating : 0.01815319999999998 s
np.stack : 0.026264800000000033 s
  • Related