Pandas: What's the difference between df[df[condition]] and df.loc[df[condition]]-CodePudding

Using a condition to mask certain rows from a dataframe, I would use:

df.loc[mask]

Setting up a mask as a condition (such as selecting only rows having 'y' in column_x) on the dataframe itself without assigning the mask to a variable, I would usually do something like:

df[df['column_x'] == 'y']

But it made me wonder what the use of df.loc actually is in these cases. Am I getting it wrong or is the use of loc in such instances redundant?

CodePudding user response：

Pandas is being smart about you mean by df[df['column_x'] == 'y'] based on the length of the boolean series df['column_x'] == 'y' and the fact that its index happens to align with the index of df. It's syntactic sugar. You can imagine cases where a dataframe is nearly square where things would be more ambiguous.

The .loc accessor is the official and least ambiguous way to access a subset of a dataframe by rows, columns, or both.