grab values and column names based on row values (multiple values in cell)-CodePudding

I have this df

df = pd.DataFrame( {'R': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'},\
                    'a': {0: 1.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 3.0, 5: 2.0, 6: 3.0},\
                    'b': {0: 1.0, 1: 1.0, 2: 1.0, 3: 2.0, 4: 2.0, 5: 0.0, 6: 3.0},\
                    'c': {0: 1.0, 1: 2.0, 2: 2.0, 3: 2.0, 4: 2.0, 5: -2.0, 6: -2.0}, \
                    'd': {0: 1.0, 1: 2.0, 2: 1.0, 3: 0.0, 4: 1.0, 5: 2.0, 6: -1.0},\
                    'e': {0: 1.0, 1: 2.0, 2: 2.0, 3: 1.0, 4: 1.0, 5: 2.0, 6: -2.0}, \
                    'f': {0: -1.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: -2.0, 5: -1.0, 6: 2.0},\
                    'g': {0: 1.0, 1: 1.0, 2: 2.0, 3: 1.5, 4: 2.0, 5: 0.0, 6: 2.0}, \
                    'h': {0: 0.0, 1: 0.0, 2: 1.0, 3: 2.0, 4: 2.0, 5: 1.0, 6: 3.0}, \
                    'i': {0: 0.0, 1: -1.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: -3.0, 6: 3.0}, \
                    'j': {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 2.0, 5: -1.0, 6: -1.0}, \
                    'k': {0: 62, 1: 166, 2: 139, 3: 60, 4: 93, 5: 17, 6: 5}} )

which gives us

    R    a    b    c     d    e    f     g    h     i    j    k
0   1   1.0  1.0  1.0   1.0  1.0  -1.0  1.0  0.0   0.0  1.0  62
1   2   1.0  1.0  2.0   2.0  2.0   0.0  1.0  0.0  -1.0  1.0  166
2   3   2.0  1.0  2.0   1.0  2.0   0.0  2.0  1.0   0.0  1.0  139
3   4   3.0  2.0  2.0   0.0  1.0   0.0  1.5  2.0   0.0  1.0  60
4   5   3.0  2.0  2.0   1.0  1.0  -2.0  2.0  2.0   0.0  2.0  93
5   6   2.0  0.0 -2.0   2.0  2.0  -1.0  0.0  1.0  -3.0  -1.0 17
6   7   3.0  3.0 -2.0  -1.0  -2.0  2.0  2.0  3.0   3.0  -1.0  5

I need 2 new columns

df['an']= displays column name of each column where the current raw had negative value

df['nv']= displays negative values of each column where the current raw had negative value

Desired output

    R    a    b    c     d    e    f     g    h     i    j    k    an        nv   
0   1   1.0  1.0  1.0   1.0  1.0  -1.0  1.0  0.0   0.0  1.0  62   'f'        -1 
1   2   1.0  1.0  2.0   2.0  2.0   0.0  1.0  0.0  -1.0  1.0  166  'i'        -1
2   3   2.0  1.0  2.0   1.0  2.0   0.0  2.0  1.0   0.0  1.0  139  '-'        -
3   4   3.0  2.0  2.0   0.0  1.0   0.0  1.5  2.0   0.0  1.0  60   '-'        - 
4   5   3.0  2.0  2.0   1.0  1.0  -2.0  2.0  2.0   0.0  2.0  93   'f'        -2
5   6   2.0  0.0 -2.0   2.0  2.0  -1.0  0.0  1.0  -3.0  -1.0 17   'c,f,i,j' [-2,-1,-3,-1]
6   7   3.0  3.0 -2.0  -1.0  -2.0  2.0  2.0  3.0   3.0  -1.0  5   'c,d,e,j' [-2,-1,-2,-1]

I tried multiple code options, such as np.where or np.select, but I could not mmake it work.

Any help will be highly appreciated.

CodePudding user response：

You can use comparison and boolean indexing per row, save the intermediate variable using assignment expression, and create a Series:

df.join(df.drop(columns='R')
          .apply(lambda s: pd.Series({'an': ','.join((S:=s[s.lt(0)]).index),
                                      'nv': list(S)}), axis=1)
       )

Or using a custom function:

def f(s):
    S = s[s.lt(0)]
    return pd.Series({'an': ','.join(S.index),
                      'nv': list(S)})

df.join(df.drop(columns='R').apply(f, axis=1))

output:

   R    a    b    c    d    e    f    g    h    i    j    k       an                        nv
0  1  1.0  1.0  1.0  1.0  1.0 -1.0  1.0  0.0  0.0  1.0   62        f                    [-1.0]
1  2  1.0  1.0  2.0  2.0  2.0  0.0  1.0  0.0 -1.0  1.0  166        i                    [-1.0]
2  3  2.0  1.0  2.0  1.0  2.0  0.0  2.0  1.0  0.0  1.0  139                                 []
3  4  3.0  2.0  2.0  0.0  1.0  0.0  1.5  2.0  0.0  1.0   60                                 []
4  5  3.0  2.0  2.0  1.0  1.0 -2.0  2.0  2.0  0.0  2.0   93        f                    [-2.0]
5  6  2.0  0.0 -2.0  2.0  2.0 -1.0  0.0  1.0 -3.0 -1.0   17  c,f,i,j  [-2.0, -1.0, -3.0, -1.0]
6  7  3.0  3.0 -2.0 -1.0 -2.0  2.0  2.0  3.0  3.0 -1.0    5  c,d,e,j  [-2.0, -1.0, -2.0, -1.0]