Imagine, that you have some old tables of data which is not curated and have string values at arbitrary places.
To be able to perform data analysis steps with numpy you should somehow handle these string-valued data. E.g. by changing all of them to NaN.
How to do this?
CodePudding user response:
import numpy
data = [
[["text", 2], [1, 3]],
[[1, 4], [1, 5]],
[[8, 8], [8, 9]]
]
a = np.array(data)
def str2nan(s):
try:
return float(s) # if we can convert to float do it
except ValueError:
return np.nan # else return NaN
vectorized_str2nan = np.vectorize(str2nan)
a = vectorized_str2nan(a)
This way a
will be:
[
[[nan, 2], [1, 3]],
[[1, 4], [1, 5]],
[[8, 8], [8, 9]]
]
This way a
is suitable now for numerical processing like np.average(a)
etc.
The point is to use np.vectorize
to convert a function to be able to work with (mutidimensional) numpy arrays smoothly.