Apologies if this sounds quite basic but I'm trying to understand the deeper mechanics of subsetting syntax:
I understand that with non-.loc subsetting, you can select columns, rows by index number, and cross-select columns and rows-by-index-number.
But by what mechanism do you subset a series of booleans from a dataframe, using non.loc syntax? e.g.,
Working with this practice df:
you could write
test['age']==42
and get a series of booleans indicating where 42 appeared in the age column.
But when you write that same boolean filter as a subset of the same df
test[test['age']==42]
you get all the columns of the df, and full rows for any row that had 42 in the age column.
I'm wondering, more granularly, by what mechanism you subset a series of booleans from a df in this non-.loc context. Put differently, is this considered a row or column selection, or is it an entirely different mechanism that simply allows inputting a list/series/df of booleans?
It seems like you're selecting whether to show the follow rows depending on the True-False value of each row. And indeed, you could write
test[[True, False, True, False, False]]
to get the same result. But you'd get an error making the same direct row selection as a list, as via
test[[0,1,2,3,4]]
At bottom I'm trying to get a better understanding of the mechanism for such boolean-filtering, and how it might relate to non-.loc row/column selection.
CodePudding user response:
Your question asks:
by what mechanism do you subset a series of booleans from a dataframe, using non.loc syntax?
The pandas docs on Boolean indexing
state:
You may select rows from a DataFrame using a boolean vector the same length as the DataFrame’s index (for example, something derived from one of the columns of the DataFrame)
You also write:
But you'd get an error making the same direct row selection as a list, as via
test[[0,1,2,3,4]]
Such error behavior is a result of the fact that []
access with a list other than booleans expects column labels, not row index labels. This is made explicit in the Basics
subsection of the Indexing and selecting data
section of the pandas docs:
You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised.