Identify the columns which contain zero and output its location-CodePudding

Suppose I have a dataframe where some columns contain a zero value as one of their elements (or potentially more than one zero). I don't specifically want to retrieve these columns or discard them (I know how to do that) - I just want to locate these. For instance: if there is are zeros somewhere in the 4th, 6th and the 23rd columns, I want a list with the output [4,6,23].

CodePudding user response：

You could iterate over the columns, checking whether 0 occurs in each columns values:

[i for i, c in enumerate(df.columns) if 0 in df[c].values]

CodePudding user response：

Use any() for the fastest, vectorized approach.

For instance,

df = pd.DataFrame({'col1': [1, 2, 3], 
                   'col2': [0, 100, 200], 
                   'col3': ['a', 'b', 'c']})

Then,

>>> s = df.eq(0).any()

col1    False
col2     True
col3    False
dtype: bool

From here, it's easy to get the indexes. For example,

>>> s[s].tolist()
['col2']

Many ways to retrieve the indexes from a pd.Series of booleans.

CodePudding user response：

Here is an approach that leverages a couple of lambda functions:

d = {'a': np.random.randint(10, size=100),
     'b': np.random.randint(1,10, size=100),
     'c': np.random.randint(10, size=100),
     'd': np.random.randint(1,10, size=100)
    }

df = pd.DataFrame(d)

df.apply(lambda x: (x==0).any())[lambda x: x].reset_index().index.to_list()

[0, 2]

Another idea based on @rafaelc slick answer (but returning relative locations of the columns instead of column names):

df.eq(0).any().reset_index()[lambda x: x[0]].index.to_list()

[0, 2]

Or with the column names instead of locations:

df.apply(lambda x: (x==0).any())[lambda x: x].index.to_list()

['a', 'c']