Home > OS >  Filtering function for pandas - VIewing NaN values within a column
Filtering function for pandas - VIewing NaN values within a column

Time:12-21

Function I have created:

#Create a function that identifies blank values
def GPID_blank(df, variable):
    df = df.loc[df['GPID'] == variable]
    return df

Test:

variable = ''
test = GPID_blank(df, variable)
test

Goal: Create a function that can filter any dataframe column 'GPID' to see all of the rows where GPID has missing data.

I have tried running variable = 'NaN' and still no luck. However, I know the function works, as if I use a real-life variable "OH82CD85" the function filters my dataset accordingly.

Therefore, why doesn't it filter out the blank cells variable = 'NaN'? I know for my dataset, there are 5 rows with GPID missing data.

Example df:

df = pd.DataFrame({'Client': ['A','B','C'], 'GPID':['BRUNS2','OH82CD85','']})

    Client  GPID
0   A   BRUNS2
1   B   OH82CD85
2   C   

Sample of GPID column:

0     OH82CD85
1     BW07TI20
2     OW36HW81
3     PE56TA73
4     CT46SX81
5     OD79AU80
6     GF46DB60
7     OL07ST01
8     VP38SM57
9     AH90AE61
10    PG86KO78
11         NaN
12         NaN
13    SO21GR72
14    DY85IN90
15    KW80CV02
16    CM15QP83
17    VC38FP82
18    DA36RX05
19    DD74HD38

CodePudding user response:

You can't use == with NaN. NaN != NaN.

Instead, you can modify your function a little to check if the parameter is NaN using pd.isna() (or np.isnan()):

def GPID_blank(df, variable):
    if pd.isna(variable):
        return df.loc[df['GPID'].isna()]
    else:
        return df.loc[df['GPID'] == variable]

CodePudding user response:

It's not working because with variable = 'NaN' you're looking for a string which content is 'NaN', not for missing values.

You can try:

import pandas as pd

def GPID_blank(df):
  # filtered dataframe with NaN values in GPID column
  blanks = df[df['GPID'].isnull()].copy()
  return blanks

filtered_df = GPID_blank(df)

CodePudding user response:

You can't really search for NaN values like an expression. Also, in your example dataframe, '' is not NaN, but is str, and can be searched like an expression.

Instead, you need to check when you want to filter for NaN, and filter differently:

def GPID_blank(df, variable):
    if pd.isna(variable):
        df = df.loc[df['GPID'].isna()]
    else:
        df = df.loc[df['GPID'] == variable]
    return df
  • Related