Home > Net >  get column names from a python dataframe having values as white spaces without using loops
get column names from a python dataframe having values as white spaces without using loops

Time:01-06

I have a below dataframe, I am trying to get the column names that has empty strings in a most efficient way. The dataframe looks like below after doing df.head().

id   type   check company  test
123   A     Soft             [[1649106820, 100029907158392,,,, 123]]                                
456   B     Hard    GMC      [[1649106812, 100029907158312,,,, 456]]   

I am trying to do it without using loops or in an efficient way Appreciate help with it

Expected output {company,test}

CodePudding user response:

Use apply() to perform the test on all rows, then use .any() to test if it's true for any row.

def empty_val(val):
    if isinstance(val, list):
        return any(str(item).strip() == "" for item in val)
    else
        return str(item).strip() == ""

empty_cols = [col for col in df if df[col].apply(empty_val).any()]

CodePudding user response:

Here's a way to find columns containing empty strings without testing individual value types:

empty_cols = {col for col in df.columns if df[col].explode().explode().astype(str).str.strip().eq('').any()}

Output:

{'company', 'test'}

Explanation:

  • use explode() twice to ensure we've unpacked values that are of type list of lists, such as those in column test in the question
  • use astype(str) to convert numerical types to string, such as those in column id
  • use str.strip().eq('') to get a boolean Series for a given column indicating whether a value (possibly unpacked) is an empty string (after stripping spaces)
  • use Series.any() to reduce the boolean Series to a boolean value indicating whether the column has any empty strings
  • use a set comprehension to get a set containing the labels of all columns that contain empty strings (including after unpacking list-of-list values).
  • Related