Home > Software design >  lack of knowledge what do dimesions really represent
lack of knowledge what do dimesions really represent

Time:12-24

2d array, consists of 2 axes, axis=0 which represents the rows and the axis=1 represents the columns

aa = np.random.randn(10, 2) # Here is 2d array, first axis has 10 rows and second axis has 2 columns

array([[ 0.6999521 , -0.17597954],
       [ 1.70622947, -0.85919459],
       [-0.90019284,  0.80774052],
       [-1.42953238,  0.19727917],
       [-0.03416532,  0.49584749],
       [-0.28981586, -0.77484498],
       [-1.31129122,  0.423833  ],
       [-0.43920016, -1.93541758],
       [-0.06667634,  2.09925218],
       [ 1.24633485, -0.04153847]])

why when I want to scatter the points I only consider the first column and the second column dimension from axis=1? do dimensions mean columns when plotting and at other times they mean axes? can you please explain more the reasons to do it like this? and if there are good references I could benefit myself on dimensions relating to this

plt.scatter(x[:,0], x[:,1])  # this also means dimensions or columns?

x[:,0], x[:,1] why not do x[0,:], x[:,1}

CodePudding user response:

It can be difficult to visualize this, especially in multiple dimensions.

The parameters to the [] operator represent the dimensions. Your first dimension is the rows. The first row is array[0]. Your second dimension is the columns. The entire second column is called array[:,1] -- the ":" is a numpy notation that means "take all of this dimension". array[2,1] refers to the second column in the third row.

plt.scatter expects the x coordinate values as its first parameter, and the y coordinate values as its second parameter. plt.scatter(x[:,0], x[:,1]) means "take all of column 0" and "take all of column 1", which is the way scatter wants them.

CodePudding user response:

With this randn call you make a 2d array with the specified shape. The dimensions, 10 and 2, don't represent anything - that's an abstract (10,2) array. Meaning comes from how you use it.

In [50]: aa = np.random.randn(10, 2)
In [51]: aa
Out[51]: 
array([[-0.26769106,  0.09882999],
       [-1.5605514 , -1.38614473],
       [ 1.23312852,  0.86838848],
       [ 1.2603898 ,  2.19895989],
       [-1.66937976,  0.79666952],
       [-0.15596669,  1.47848784],
       [ 1.74964902,  0.39280584],
       [-1.0982447 ,  0.46888408],
       [ 0.84396231, -0.34809148],
       [-0.83489678, -1.8093045 ]])

That's a display - with rows and columns.

Rather than pass the slices directly to scatter lets assign them to variables:

In [52]: x = aa[:,0]; y = aa[:,1]; x,y
Out[52]: 
(array([-0.26769106, -1.5605514 ,  1.23312852,  1.2603898 , -1.66937976,
        -0.15596669,  1.74964902, -1.0982447 ,  0.84396231, -0.83489678]),
 array([ 0.09882999, -1.38614473,  0.86838848,  2.19895989,  0.79666952,
         1.47848784,  0.39280584,  0.46888408, -0.34809148, -1.8093045 ]))

We now have two 1d arrays with shape (10,) (that's a 1 element tuple). We can then plot them with:

In [53]: plt.scatter(x,y)

I could just as well used

x = np.arange(10); y = np.random.randn(10)

to make two 1d arrays.

The dimensions of the aa array have nothing to do with the axes of a scatter plot.

I could select a 'row' of aa, but will only get a (2,) shape array. That can't be plotted against a (10,) array:

In [53]: aa[0,:]
Out[53]: array([-0.26769106,  0.09882999])

As for meaning of dimensions in sum/mean, why not experiement?

Sum all values:

In [54]: aa.sum()
Out[54]: 2.2598841819604134

sum down the columns, resulting in one value per column:

In [55]: aa.sum(axis=0)
Out[55]: array([-0.49960074,  2.75948492])

It can help to keepdims, producing a (1,2) array:

In [56]: aa.sum(axis=0, keepdims=True)
Out[56]: array([[-0.49960074,  2.75948492]])

or a (10,1) array:

In [57]: aa.sum(axis=1, keepdims=True)
Out[57]: 
array([[-0.16886107],
       [-2.94669614],
       [ 2.101517  ],
       [ 3.45934969],
       [-0.87271024],
       [ 1.32252115],
       [ 2.14245486],
       [-0.62936062],
       [ 0.49587083],
       [-2.64420128]])

There's some ambiguity when talking about summing along rows or columns when dealing with 2d arrays. It becomes clearer when we apply sum to 1d arrays (sum the only one), or 3d.

For example, note which dimension is missing when I do:

In [58]: np.arange(24).reshape(2,3,4).sum(axis=1).shape
Out[58]: (2, 4)

or

In [59]: np.arange(24).reshape(2,3,4).sum(axis=2)
Out[59]: 
array([[ 6, 22, 38],
       [54, 70, 86]])

Again - dimensions of numpy arrays are abstract things. An array can have 0, 1, 2 or more (up to 32) dimensions. Most of linear algebra deals with 2d arrays, matrices and "vectors". You can do LA with numpy, but numpy is used for much more.

edit

You could think of your aa as 10 2-element points. Then aa[:,0] are all the x coordinates. A mean with axis=0 would be the "center of mass" of those points.

In [60]: np.mean(aa, axis=0)
Out[60]: array([-0.04996007,  0.27594849])

Mean on axis=1 may not make sense, though you could calculate the norm of the points (sqrt(x^2 y^2)), or the length of the vectors represented by the points.

In [61]: np.linalg.norm(aa, axis=1)
Out[61]: 
array([0.28535218, 2.08727523, 1.50821235, 2.53456249, 1.84973271,
       1.48669159, 1.79320052, 1.19414978, 0.91292938, 1.99264533])
  • Related