Home > OS >  Parsing a very large array with list comprehension is slow
Parsing a very large array with list comprehension is slow

Time:06-10

I have an ultra large list of numerical values in numpy.float64 format, and I want to convert each value, to 0.0 if there's an inf value, and parse the rest the elements to simple float.

This is my code, which works perfectly:

# Values in numpy.float64 format.
original_values = [np.float64("Inf"), np.float64(0.02345), np.float64(0.2334)]

# Convert them
parsed_values = [0.0 if x == float("inf") else float(x) for x in original_values]

But this is slow. Is there any way to faster this code? Using any magic with map or numpy (I have no experience with these libraries)?

CodePudding user response:

Hey~ you probably are asking how could you do it faster with numpy, the quick answer is to turn the list into a numpy array and do it the numpy way:

import numpy as np

original_values = [np.float64("Inf"), ..., np.float64(0.2334)]
arr = np.array(original_values)
arr[arr == np.inf] = 0

where arr == np.inf returns another array that looks like array([ True, ..., False]) and can be used to select indices in arr in the way I showed.

Hope it helps.

I tested a bit, and it should be fast enough:

# Create a huge array
arr = np.random.random(1000000000)
idx = np.random.randint(0, high=1000000000, size=1000000)
arr[idx] = np.inf

# Time the replacement
def replace_inf_with_0(arr=arr):
    arr[arr == np.inf] = 0

timeit.Timer(replace_inf_with_0).timeit(number=1)

The output says it takes 1.5 seconds to turn all 1,000,000 infs into 0s in a 1,000,000,000-element array.


@Avión used arr.tolist() in the end to convert it back to a list for MongoDB, which should be the common way. I tried with the billion-sized array, and the conversion took about 30 seconds, while creating the billion-sized array took less than 10 sec. So, feel free to recommend more efficient methods.

  • Related