Suppose I have a dataframe where some columns contain a zero value as one of their elements (or potentially more than one zero). I don't specifically want to retrieve these columns or discard them (I know how to do that) - I just want to locate these. For instance: if there is are zeros somewhere in the 4th, 6th and the 23rd columns, I want a list with the output [4,6,23].
CodePudding user response:
You could iterate over the columns, checking whether 0
occurs in each columns values:
[i for i, c in enumerate(df.columns) if 0 in df[c].values]
CodePudding user response:
Use any()
for the fastest, vectorized approach.
For instance,
df = pd.DataFrame({'col1': [1, 2, 3],
'col2': [0, 100, 200],
'col3': ['a', 'b', 'c']})
Then,
>>> s = df.eq(0).any()
col1 False
col2 True
col3 False
dtype: bool
From here, it's easy to get the indexes. For example,
>>> s[s].tolist()
['col2']
Many ways to retrieve the indexes from a pd.Series
of booleans.
CodePudding user response:
Here is an approach that leverages a couple of lambda functions:
d = {'a': np.random.randint(10, size=100),
'b': np.random.randint(1,10, size=100),
'c': np.random.randint(10, size=100),
'd': np.random.randint(1,10, size=100)
}
df = pd.DataFrame(d)
df.apply(lambda x: (x==0).any())[lambda x: x].reset_index().index.to_list()
[0, 2]
Another idea based on @rafaelc slick answer (but returning relative locations of the columns instead of column names):
df.eq(0).any().reset_index()[lambda x: x[0]].index.to_list()
[0, 2]
Or with the column names instead of locations:
df.apply(lambda x: (x==0).any())[lambda x: x].index.to_list()
['a', 'c']