Home > Net >  Pandas: for each row, get the matching column
Pandas: for each row, get the matching column

Time:05-24

I have a dataframe like this,

df = pd.DataFrame({'A':[1,-2,3,4,5],'B':[1,2,3,4,-5],'C':[1,2,3,-4,5],'value':[123,1567,1456,764,2456]})
print(df)

is,

   A  B  C  value
0  1  1  1    123
1 -2  2  2   1567
2  3 3  3   1456
3  4  4 -4    764
4  5  -5  5   2456

I know that there is only one negative value per row. I want to know in which column such a value is. I know I can do it like this,

desiredOutput = []
for i,row in df.iterrows():
    if any(row < 0):
        print(row[row < 0].index.to_numpy()[0],row[row[row < 0].index.to_numpy()[0]],row.value)
        desiredOutput.append(row[row < 0].index.to_numpy()[0])
    else:
        desiredOutput.append(None)

print(desiredOutput)

gives,

A -2 1567
C -4 764
B -5 2456
[None, 'A', None, 'C', 'B']

But I imagine there must be a pythonnic way to do it (probably using .apply()?)

CodePudding user response:

You can use idxmax on a boolean mask and where to mask the cases where no value matches:

cols = ['A', 'B', 'C']  # optional: if needed to select columns
m = df[cols].lt(0)      # else:  df.lt(0)
df['result'] = m.idxmax(axis=1).where(m.any(1))

output:

   A  B  C  value result
0  1  1  1    123    NaN
1 -2  2  2   1567      A
2  3  3  3   1456    NaN
3  4  4 -4    764      C
4  5 -5  5   2456      B

Alternative as a one-liner for all numerical columns (requires python ≥3.8):

df['result'] = ((m:=df.select_dtypes('number').lt(0))
                .idxmax(axis=1).where(m.any(1))
                )

CodePudding user response:

Use DataFrame.dot for matrix multiplication:

desiredOutput = df.lt(0).dot(df.columns).mask(lambda x: x.eq(''), None)
print (desiredOutput)
0    None
1       A
2    None
3       C
4       B
dtype: object

desiredOutput = df.lt(0).dot(df.columns).replace('',np.nan)
print (desiredOutput)
0    NaN
1      A
2    NaN
3      C
4      B
dtype: object

With apply it is possible, but slowier if filter columns by mask per rows:

desiredOutput = df.lt(0).apply(lambda x: next(iter(x.index[x]), None), axis=1)
print (desiredOutput)
0    None
1       A
2    None
3       C
4       B
dtype: object
  • Related