data = credit_data[credit_data['CREDIT_LIMIT'].isna()]
this is the code snippet from a code I was writing. Here I wanted to print all the rows that contain nan values in a column. This code accomplishes that but what I want to know is how is this actually happening.
As credit_data['CREDIT_LIMIT'].isna() prints out a series containing bool values so how by just passing that series through our dataframe (credit_data) we are getting all the rows that contain nan values
at this point I have searched on some blogs and pandas documentation for dataframe.isna() and some answers on this site but haven't found anything satisfactory. I would be great if you can point me right direction like give a blog post link or some answer that already answers this query thanks
CodePudding user response:
As credit_data['CREDIT_LIMIT'].isna() prints out a series containing bool values so how by just passing that series through our dataframe (credit_data) we are getting all the rows that contain nan values
By passing boolean Series you have used feature named Boolean Masking, it is done by providing iterable (which might be, but does not have to be Series) of bool values of length equal to DataFrame, consider following example
import pandas as pd
df = pd.DataFrame({'letter':['A','B','C','D','E']})
mask = [True,False,True,False,True]
print(df[mask])
output
letter
0 A
2 C
4 E
Note that this feature is also present in numpy
for example
import numpy as np
arr = np.arange(25).reshape((5,5))
mask = [True,False,True,False,True]
print(arr[mask])
output
[[ 0 1 2 3 4]
[10 11 12 13 14]
[20 21 22 23 24]]