Home > Mobile >  What is properway to specify numpy masked array maksed value?
What is properway to specify numpy masked array maksed value?

Time:07-24

I basically want to run something like the following

x = np.array([1,2,3,4,5])
a = ma.masked_array(x, mask=[0, 0, 0, 1, 0])

for i in range(5):
   if (a[i] == "--"):
      print("a[{0:d}] is masked value".format(i))

I am not sure how I should specify the -- value of the masked array in the if (a[i] == "--") part where "--" is something that I could not figure out. I know there are few other ways of doing it by processing the entire masked array into a boolean values, but I don't want that.

Edit.

The array a is an masked array, and when I print it out I get

masked_array(data=[1, 2, 3, --, 5],
             mask=[False, False, False,  True, False],
       fill_value=999999)

What I want to do is to skip the -- values in that output using the if statement.

CodePudding user response:

A masked array has two key attributes, data and mask.

In [63]: a.mask
Out[63]: array([False, False, False,  True, False])
In [64]: a.data
Out[64]: array([1, 2, 3, 4, 5])

getmask docs say its equivalent to getting the attribute:

In [65]: np.ma.getmask(a)
Out[65]: array([False, False, False,  True, False])

That mask can then be used to select values from data:

In [66]: a.data[a.mask]
Out[66]: array([4])

More commonly we are interested in the unmasked values:

In [67]: a.compressed()
Out[67]: array([1, 2, 3, 5])

After all if using masking, we aren't "supposed" to care about the masked values. The compressed ones can be used to take the sum:

In [68]: a.sum()
Out[68]: 11

Alternatively the masked values can be filled with something innocuous

In [69]: a.filled()
Out[69]: array([     1,      2,      3, 999999,      5])
In [70]: a.filled(0)
Out[70]: array([1, 2, 3, 0, 5])

CodePudding user response:

The proper way should be:

mask_a = numpy.ma.getmask(a)

which following your example returns the mask array:

array([False, False, False,  True, False])

If I understand correctly how numpy works internally, this does not "process" the masked array to get boolean out of it. The mask is already there, you are just getting it in a proper array which can be used in your for loop, so if you are worried about performance... don't worry.

for i in range(5):
    if mask_a[i]:
        print("a[{0:d}] is masked value".format(i))

However, if for whatever reason you don't want to use the getmask function, you can get the string representation of a.

str_a = str(a)

which in your example is: '[1 2 3 -- 5]'

Then you can strip the square brackets and split the string on white spaces:

str_a = str(a)[1:-1].split()

which in your example is ['1', '2', '3', '--', '5'].

Then you have a list where you can filter out the "--" values with your for loop:

for i in range(5):
    if str_a[i] == "--":
        print("a[{0:d}] is masked value".format(i))

But honestly, using the getmask function should be the way to go: I didn't profile it, but I expect it to be faster.

  • Related