Using a condition to mask certain rows from a dataframe, I would use:
df.loc[mask]
Setting up a mask as a condition (such as selecting only rows having 'y' in column_x) on the dataframe itself without assigning the mask to a variable, I would usually do something like:
df[df['column_x'] == 'y']
But it made me wonder what the use of df.loc
actually is in these cases. Am I getting it wrong or is the use of loc
in such instances redundant?
CodePudding user response:
Pandas is being smart about you mean by df[df['column_x'] == 'y']
based on the length of the boolean series df['column_x'] == 'y'
and the fact that its index happens to align with the index of df
. It's syntactic sugar. You can imagine cases where a dataframe is nearly square where things would be more ambiguous.
The .loc
accessor is the official and least ambiguous way to access a subset of a dataframe by rows, columns, or both.