get column names from a python dataframe having values as white spaces without using loops-CodePudding

I have a below dataframe, I am trying to get the column names that has empty strings in a most efficient way. The dataframe looks like below after doing df.head().

id   type   check company  test
123   A     Soft             [[1649106820, 100029907158392,,,, 123]]                                
456   B     Hard    GMC      [[1649106812, 100029907158312,,,, 456]]

I am trying to do it without using loops or in an efficient way Appreciate help with it

Expected output {company,test}

CodePudding user response：

Use apply() to perform the test on all rows, then use .any() to test if it's true for any row.

def empty_val(val):
    if isinstance(val, list):
        return any(str(item).strip() == "" for item in val)
    else
        return str(item).strip() == ""

empty_cols = [col for col in df if df[col].apply(empty_val).any()]

CodePudding user response：

Here's a way to find columns containing empty strings without testing individual value types:

empty_cols = {col for col in df.columns if df[col].explode().explode().astype(str).str.strip().eq('').any()}

Output:

{'company', 'test'}

Explanation:

use explode() twice to ensure we've unpacked values that are of type list of lists, such as those in column test in the question
use astype(str) to convert numerical types to string, such as those in column id
use str.strip().eq('') to get a boolean Series for a given column indicating whether a value (possibly unpacked) is an empty string (after stripping spaces)
use Series.any() to reduce the boolean Series to a boolean value indicating whether the column has any empty strings
use a set comprehension to get a set containing the labels of all columns that contain empty strings (including after unpacking list-of-list values).