I would like to get the following foreach row, the column indices where the column value > 0. If possible a vectorized approach. An example data frame
c1 c2 c3 c4 c5 c6 c7 c8 c9
1 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
1 5 5 0 0 1 0 4 6
The output is expected to be
[0, 1]
[0]
[1]
[0, 1, 2, 5, 7, 8]
CodePudding user response:
One quick option is to apply numpy.flatnonzero
to each row:
import numpy as np
df.apply(np.flatnonzero, axis=1)
0 [0, 1]
1 [0]
2 [1]
3 [0, 1, 2, 5, 7, 8]
dtype: object
If you care about performance, here is a pure numpy option (caveat for this option is if the row doesn't have any non zero values, it will be ignored in the result. Choose the method that works for you depending on your need):
idx, idy = np.where(df != 0)
np.split(idy, np.flatnonzero(np.diff(idx) != 0) 1)
[array([0, 1], dtype=int32),
array([0], dtype=int32),
array([1], dtype=int32),
array([0, 1, 2, 5, 7, 8], dtype=int32)]
CodePudding user response:
Not as sexy as @Psidom's answer, but still, here is a solution using numpy.argwhere
import numpy as np
pd.DataFrame(np.argwhere(df.gt(0).values)).groupby(0)[1].apply(list)
output:
0 [0, 1]
1 [0]
2 [1]
3 [0, 1, 2, 5, 7, 8]
Just for fun, here is a pandas version:
s = df.set_axis(range(len(df.columns)), axis=1).stack()
s[s.gt(0)].reset_index(level=1)['level_1'].groupby(level=0).apply(list)