Home > Software design >  Unclear array of Array notation in numpy
Unclear array of Array notation in numpy

Time:11-01

I am on my way to understand a vectorized approach to calculating (and plotting) Julia sets. On the web, I found the following code (annotations are mainly mine, based on my growing understanding of the ideas behind the code):

import numpy as np
import matplotlib.pyplot as plt

c = -0.74543 0.11301j                                           # Example value for this picture (Julia set)
n = 512                                                         # Maximum number of iterations

x = np.linspace(-1.5, 1.5, 2000).reshape((1, 2000))             # 1 row, 2000 columns
y = np.linspace(-1.2, 1.2, 1600).reshape((1600, 1))             # 1600 rows, 1 column
z = x   1j*y                                                    # z is an array with 1600 * 2000 complex entries

c = np.full(z.shape, c)             # c is a complex number matrix to be added for the iteration

diverge = np.zeros(z.shape)         # 1600 * 2000 zeroes (0s), contains divergent iteration counts
m = np.full(z.shape, True)          # 1600 * 2000 True, used as a kind of mask (convergent values)

for i in range(0,n):          # Do at most n iterations
    z[m] = z[m]**2   c[m]     # Matrix op: Complex iteration for fixed c (Julia set perspective)
    m[np.abs(z) > 2] = False  # threshold for convergence of absolute(z) is 2
    diverge[m] = i

plt.imshow(diverge, cmap='magma')   # Color map "magma" applied to the iterations for each point
plt.show()                          # Display image plotted

I don't understand the mechanics of the line diverge[m] = i I gather that m is a 1600*2000 element array of Booleans. It seems that m is used as a kind of mask to let stand only those values in diverge[] for which the corresponding element in m is True. Yet I would like to understand this concept in greater detail. The syntax diverge[m] = i seems to imply that an array is used as some sort of generalized "index" to another array (diverge), and I could use some help understanding this concept. (The code runs as expected, I just have problems understanding the working of it.)

Thank you.

CodePudding user response:

Yes, you can use an array to index another. In many many ways. That a complex matter. And even if I flatter myself to understand numpy quite a bit now, I still sometimes encouter array indexation that make me scratch my head a little bit before I understand.

But this case is not a very complex one

M=np.array([[1,2,3], 
            [4,5,6],
            [7,8,9]])
msk=np.array([[True,  False,  True],
              [True,  True,   True],
              [False, True,   False]])
M[msk]

Returns array([1, 3, 4, 5, 6, 8]). You can, I am sure, easily understand the logic.

But more importantly, indexation is a l-value. So that means that M[msk] can be to the left side of the =. And then the values of M are impacted

So, that means that

M[msk]=0
M

shows

array([[0, 2, 0],
       [0, 0, 0],
       [7, 0, 9]])

Likewise

M=np.array([[1,2,3], 
            [4,5,6],
            [7,8,9]])
A=np.array([[2,2,4],
            [4,6,6],
            [8,8,8]])
msk=np.array([[True,  False,  True],
              [True,  True,   True],
              [False, True,   False]])
M[msk] = M[msk] A[msk]
M

Result is

array([[ 3,  2,  7],
       [ 8, 11, 12],
       [ 7, 16,  9]])

So back to your case,

    z[m] = z[m]**2   c[m]     # Matrix op: Complex iteration for fixed c (Julia set perspective)

Is somehow just an optimisation. You could have also just z=z**2 c. But why would be the point to compute that even where overflow has already occured. So, it computes z=z**2 c only where there was no overflow yet

    m[np.abs(z) > 2] = False  # threshold for convergence of absolute(z) is 2

np.abs(s)>2 is a 2d array of True/False values. m is set to False at for every "pixels" for which |z|>2. Other values of m remain unchanged. So they stay False if they were already False. Note that this one is slightly over complicated. Since, because of the previous line, z doesn't change once it became >2, in reality, there is no pixels where np.abs(z)<=2 and yet m is already False. So

   m=np.abs(z)<=2

would have worked as well. And it would not have been slower, since the original version computes that anyway. In fact, it would be faster, since we spare the indexation/affecation operation. On my computer my version runs 1.3 seconds faster than the original (on a 12 second computation time. So 10% approx.) But the original version has the merit to makes next line easier to understand, becaus it makes one point clear: m starts with all True values, and then some values turn False as long as algorithm runs, but none never become True again.

    diverge[m] = i

m being the mask of pixels that has not yet diverged (it starts with all True, and as long as we iterate, more and more values of m are False). So doing so update diverge to i everywhere no divergence occured yet (the name of the variable is not the most pertinent).

So pixels whose z values become>2 at iteration 50, so whose m value became False at iteration 50, would have been updated to 1, then 2, then 3, then 4, ..., then 48, then 49 by this line. But not to 50, 51, ...

So at the end, what stays in "diverge" is the last i for which m was still True. That is the last i for which algorithm was still converging. Or, at 1 unit shift, the first one for which algorithm diverges.

  • Related