Home > front end >  Get column indices for non-zero values in each row in pandas data frame
Get column indices for non-zero values in each row in pandas data frame

Time:09-28

I would like to get the following foreach row, the column indices where the column value > 0. If possible a vectorized approach. An example data frame

c1  c2  c3  c4 c5 c6 c7 c8  c9
 1   1   0   0  0  0  0  0   0
 1   0   0   0  0  0  0  0   0
 0   1   0   0  0  0  0  0   0
 1   5   5   0  0  1  0  4   6

The output is expected to be

[0, 1]
[0]
[1]
[0, 1, 2, 5, 7, 8]

CodePudding user response:

One quick option is to apply numpy.flatnonzero to each row:

import numpy as np

df.apply(np.flatnonzero, axis=1)

0                [0, 1]
1                   [0]
2                   [1]
3    [0, 1, 2, 5, 7, 8]
dtype: object

If you care about performance, here is a pure numpy option (caveat for this option is if the row doesn't have any non zero values, it will be ignored in the result. Choose the method that works for you depending on your need):

idx, idy = np.where(df != 0)
np.split(idy, np.flatnonzero(np.diff(idx) != 0)   1)

[array([0, 1], dtype=int32), 
 array([0], dtype=int32), 
 array([1], dtype=int32), 
 array([0, 1, 2, 5, 7, 8], dtype=int32)]

CodePudding user response:

Not as sexy as @Psidom's answer, but still, here is a solution using numpy.argwhere

import numpy as np
pd.DataFrame(np.argwhere(df.gt(0).values)).groupby(0)[1].apply(list)

output:

0                [0, 1]
1                   [0]
2                   [1]
3    [0, 1, 2, 5, 7, 8]

Just for fun, here is a pandas version:

s = df.set_axis(range(len(df.columns)), axis=1).stack()
s[s.gt(0)].reset_index(level=1)['level_1'].groupby(level=0).apply(list)
  • Related