How to change all string values in a multidimensional numpy array to NaN?-CodePudding

Imagine, that you have some old tables of data which is not curated and have string values at arbitrary places.

To be able to perform data analysis steps with numpy you should somehow handle these string-valued data. E.g. by changing all of them to NaN.

How to do this?

CodePudding user response：

import numpy

data = [
    [["text", 2], [1, 3]],
    [[1, 4], [1, 5]],
    [[8, 8], [8, 9]]
]

a = np.array(data)

def str2nan(s):
    try:
        return float(s)  # if we can convert to float do it
    except ValueError:
        return np.nan    # else return NaN

vectorized_str2nan = np.vectorize(str2nan)

a = vectorized_str2nan(a)

This way a will be:

[
    [[nan, 2], [1, 3]],
    [[1, 4], [1, 5]],
    [[8, 8], [8, 9]]
]

This way a is suitable now for numerical processing like np.average(a) etc.

The point is to use np.vectorize to convert a function to be able to work with (mutidimensional) numpy arrays smoothly.