What does the [1] do when using .where()?-CodePudding

I m practicing on a Data Cleaning Kaggle excercise.

In parsing dates example I can´t figure out what the [1] does at the end of the indices object.

Thanks..

    # Finding indices corresponding to rows in different date format   

    indices = np.where([date_lengths == 24])[1]
    print('Indices with corrupted data:', indices)
    earthquakes.loc[indices]

CodePudding user response：

As described in the documentation, numpy.where called with a single argument is equivalent to calling np.asarray([date_lengths == 24]).nonzero().

numpy.nonzero return a tuple with as many items as the dimensions of the input array with the indexes of the non-zero values.

>>> np.nonzero([1,0,2,0])
(array([0, 2]),)

Slicing [1] enables to get the second element (i.e. second dimension) but as the input was wrapped into […], this is equivalent to doing:

np.where(date_lengths == 24)[0]

>>> np.nonzero([1,0,2,0])[0]
array([0, 2])

CodePudding user response：

It is an artefact of the extra [] around the condition. For example:

a = np.arange(10)

To find, for example, indices where a>3 can be done like this:

np.where(a > 3)

gives as output a tuple with one array

(array([4, 5, 6, 7, 8, 9]),)

So the indices can be obtained as

indices = np.where(a > 3)[0]

In your case, the condition is between [], which is unnecessary, but still works.

np.where([a > 3])

returns a tuple of which the first is an array of zeros, and the second array is the array of indices you want

(array([0, 0, 0, 0, 0, 0]), array([4, 5, 6, 7, 8, 9]))

so the indices are obtained as

indices = np.where([a > 3])[1]