I basically want to run something like the following
x = np.array([1,2,3,4,5])
a = ma.masked_array(x, mask=[0, 0, 0, 1, 0])
for i in range(5):
if (a[i] == "--"):
print("a[{0:d}] is masked value".format(i))
I am not sure how I should specify the --
value of the masked array in the if (a[i] == "--")
part where "--"
is something that I could not figure out. I know there are few other ways of doing it by processing the entire masked array into a boolean values, but I don't want that.
Edit.
The array a
is an masked array, and when I print it out I get
masked_array(data=[1, 2, 3, --, 5],
mask=[False, False, False, True, False],
fill_value=999999)
What I want to do is to skip the --
values in that output using the if
statement.
CodePudding user response:
A masked array has two key attributes, data
and mask
.
In [63]: a.mask
Out[63]: array([False, False, False, True, False])
In [64]: a.data
Out[64]: array([1, 2, 3, 4, 5])
getmask
docs say its equivalent to getting the attribute:
In [65]: np.ma.getmask(a)
Out[65]: array([False, False, False, True, False])
That mask can then be used to select values from data
:
In [66]: a.data[a.mask]
Out[66]: array([4])
More commonly we are interested in the unmasked values:
In [67]: a.compressed()
Out[67]: array([1, 2, 3, 5])
After all if using masking, we aren't "supposed" to care about the masked values. The compressed ones can be used to take the sum:
In [68]: a.sum()
Out[68]: 11
Alternatively the masked values can be filled with something innocuous
In [69]: a.filled()
Out[69]: array([ 1, 2, 3, 999999, 5])
In [70]: a.filled(0)
Out[70]: array([1, 2, 3, 0, 5])
CodePudding user response:
The proper way should be:
mask_a = numpy.ma.getmask(a)
which following your example returns the mask array:
array([False, False, False, True, False])
If I understand correctly how numpy works internally, this does not "process" the masked array to get boolean out of it. The mask is already there, you are just getting it in a proper array which can be used in your for
loop, so if you are worried about performance... don't worry.
for i in range(5):
if mask_a[i]:
print("a[{0:d}] is masked value".format(i))
However, if for whatever reason you don't want to use the getmask
function, you can get the string representation of a
.
str_a = str(a)
which in your example is: '[1 2 3 -- 5]'
Then you can strip the square brackets and split the string on white spaces:
str_a = str(a)[1:-1].split()
which in your example is ['1', '2', '3', '--', '5']
.
Then you have a list where you can filter out the "--"
values with your for
loop:
for i in range(5):
if str_a[i] == "--":
print("a[{0:d}] is masked value".format(i))
But honestly, using the getmask
function should be the way to go: I didn't profile it, but I expect it to be faster.