Why numpy slicing will produce one-dimension array?-CodePudding

I'm learning numpy from a YouTube tutorial. In a video he demonstrated that

wine_data_arr[:, 0].shape

where wine_data_arr are a two dimensional numpy array imported from sklearn. And the result is (178,), and he said "it is a one dimensional array". But in math for example this

[1,2,3]

can represent a 1 by 3 matrix, which has dimension 2. So my question is why wine_data_arr[:, 0] is a one dimension array? I guess this definition must be useful in some situation. So what's that situation?

Try to be more specific: when writing wine_data_arr[:, 0] I provide two arguments, i.e. : and 0 and the result is one dimension. When I write wine_data_arr[:, (0,4)], I still provide two arguments : and (0,4), a tuple, and the result is two dimension. Why not both produce two dimension matrix?

CodePudding user response：

Even if they "look" the same, a vector is not the same as a matrix. Consider:

>>> np.array([1,2,3,4])
array([1, 2, 3,4])
>>> np.matrix([1,2,3,4])
matrix([[1, 2, 3,4]])
>>> np.matrix([[1,2],[3,4]])
matrix([[1, 2],
        [3, 4]])

When slicing a two-dimensional array like

>>> wine_data_arr = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> wine_data_arr
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

you can request a lower-dimensional component (a single row or column) using an integer index

>>> wine_data_arr[:,0]
array([1, 4, 7])
>>> wine_data_arr[0,:]
array([1, 2, 3])

or a same-dimensional "piece" using a slice index:

>>> wine_data_arr[:, 0:1]
array([[1],       
       [4],
       [7]])

If you use two integer indices, you get a single zero-dimensional element of the array:

>>> wine_data_arr[0,0]
1

CodePudding user response：

In numpy arrays can have 0, 1, 2 or more dimensions. In contrast to MATLAB there isn't a 2d lower boundary. Also numpy is generally consistent with Python lists, in display and indexing. MATLAB generally follows linear algebra conventions, but I'm sure there are other math definitions for arrays and vectors. In physics vector represents a point in space, or a direction, not a 2d 'column vector' or 'row vector'.

A list:

In [159]: alist = [1, 2, 3]
In [160]: len(alist)
Out[160]: 3

An array made from this list:

In [161]: arr = np.array(alist)
In [162]: arr.shape
Out[162]: (3,)

Indexing a list removes a level of nesting. Scalar indexing an array removes a dimension. See

https://numpy.org/doc/stable/user/basics.indexing.html

In [163]: alist[0]
Out[163]: 1
In [164]: arr[0]
Out[164]: 1

A 2d array:

In [166]: marr = np.arange(4).reshape(2, 2)
In [167]: marr
Out[167]: 
array([[0, 1],
       [2, 3]])

Again, scalar indexing removes a dimension:

In [169]: marr[0,:]
Out[169]: array([0, 1])
In [170]: marr[:, 0]
Out[170]: array([0, 2])
In [172]: marr[1, 1]
Out[172]: 3

Indexing with a list or slice preserves the dimension:

In [173]: marr[0, [1]]
Out[173]: array([1])
In [174]: marr[0, 0:1]
Out[174]: array([0])

Count the [] to determine the dimensionality.

CodePudding user response：

The short answer: this is a convention.

Before I go into further details, let me clarify that the "dimensionality" in NumPy, for example, is not the same as that in math. In math, [1, 2, 3] is a three-dimensional vector, or a one by three dimensional matrix if you want. However, here, the dimensionality really means the "physical" dimension of the array, i.e., how many axes are present in your array (or matrix, or tensor, etc.).

Now let me get back to your question of why "this" particular definition of array dimension is helpful. What I'm going to say next is somewhat philosophical and is my own take of it. Essentially, it all boils down to communication between programmers. For example, when you are reading the documentation of some Python code and wondering the dimensionality of the output array, sure the documentation can write "N x M x ..." and then carefully define what N, M, etc. are. But in many cases, just the number of axes (or the "dimensionality" referred to in NumPy) may be sufficient to inform you. In this case, the documentation becomes much cleaner and easier to read while providing enough information about the expected outcome.