how does this pandas snippet code work behind the scene-CodePudding

data = credit_data[credit_data['CREDIT_LIMIT'].isna()]

this is the code snippet from a code I was writing. Here I wanted to print all the rows that contain nan values in a column. This code accomplishes that but what I want to know is how is this actually happening.

As credit_data['CREDIT_LIMIT'].isna() prints out a series containing bool values so how by just passing that series through our dataframe (credit_data) we are getting all the rows that contain nan values

at this point I have searched on some blogs and pandas documentation for dataframe.isna() and some answers on this site but haven't found anything satisfactory. I would be great if you can point me right direction like give a blog post link or some answer that already answers this query thanks

CodePudding user response：

As credit_data['CREDIT_LIMIT'].isna() prints out a series containing bool values so how by just passing that series through our dataframe (credit_data) we are getting all the rows that contain nan values

By passing boolean Series you have used feature named Boolean Masking, it is done by providing iterable (which might be, but does not have to be Series) of bool values of length equal to DataFrame, consider following example

import pandas as pd
df = pd.DataFrame({'letter':['A','B','C','D','E']})
mask = [True,False,True,False,True]
print(df[mask])

output

  letter
0      A
2      C
4      E

Note that this feature is also present in numpy for example

import numpy as np
arr = np.arange(25).reshape((5,5))
mask = [True,False,True,False,True]
print(arr[mask])

output

[[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]]