Home > Software engineering >  Select data and count when specific columns have null value in pandas
Select data and count when specific columns have null value in pandas

Time:11-23

I have a dataframe where there are 2 columns I want to filter and count total number of "null" values for each column.

Tried this code;

chck_nulls = df['account_id'].isnull().sum() | df['customer_id'].isnull().sum()
print (df[chck_nulls])

Getting this error;

    chck_nulls = df['account_id'].isnull().sum() | df['customer_id'].isnull().sum()
    print (df[chck_nulls])
    1
    chck_nulls = df['account_id'].isnull().sum() | df['customer_id'].isnull().sum()
    2
    print (df[chck_nulls])
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
       3079             try:
    -> 3080                 return self._engine.get_loc(casted_key)
       3081             except KeyError as err:
KeyError: 28671

Sample Data

Customer Name   account_id  customer_id
Adam            null        null
Michael         null        null
Jenkins         null        null

Expected results;

customer_id        3
account_id         3

Any help would be highly appreciated!

Thanks

CodePudding user response:

chck_nulls_account_id, chck_nulls_customer_id = (df['account_id'].isnull()).sum(), (df['customer_id'].isnull()).sum()

# print(chck_nulls_account_id, chck_nulls_customer_id)
print(f'customer_id\t{chck_nulls_customer_id}')
print(f'account_id\t{chck_nulls_account_id}')

# The following two print statements will give you the df with entries having only null values in each column
print(df[df['account_id'].isnull()])
print(df[df['customer_id'].isnull()])
# The following print statement will give you the df with entries having only null values in both the columns
print(df[(df['account_id'].isnull()) | df['customer_id'].isnull()])


print(chck_nulls) will give you the correct sum of total null entries in account_id and customer_id columns.

  • Related