I have an ultra large list of numerical values in numpy.float64
format, and I want to convert each value, to 0.0
if there's an inf
value, and parse the rest the elements to simple float.
This is my code, which works perfectly:
# Values in numpy.float64 format.
original_values = [np.float64("Inf"), np.float64(0.02345), np.float64(0.2334)]
# Convert them
parsed_values = [0.0 if x == float("inf") else float(x) for x in original_values]
But this is slow. Is there any way to faster this code? Using any magic with map
or numpy
(I have no experience with these libraries)?
CodePudding user response:
Hey~ you probably are asking how could you do it faster with numpy, the quick answer is to turn the list into a numpy array and do it the numpy way:
import numpy as np
original_values = [np.float64("Inf"), ..., np.float64(0.2334)]
arr = np.array(original_values)
arr[arr == np.inf] = 0
where arr == np.inf
returns another array that looks like array([ True, ..., False])
and can be used to select indices in arr
in the way I showed.
Hope it helps.
I tested a bit, and it should be fast enough:
# Create a huge array
arr = np.random.random(1000000000)
idx = np.random.randint(0, high=1000000000, size=1000000)
arr[idx] = np.inf
# Time the replacement
def replace_inf_with_0(arr=arr):
arr[arr == np.inf] = 0
timeit.Timer(replace_inf_with_0).timeit(number=1)
The output says it takes 1.5 seconds to turn all 1,000,000 inf
s into 0
s in a 1,000,000,000-element array.
@Avión used arr.tolist()
in the end to convert it back to a list for MongoDB, which should be the common way. I tried with the billion-sized array, and the conversion took about 30 seconds, while creating the billion-sized array took less than 10 sec. So, feel free to recommend more efficient methods.