I have a dataframe like this,
df = pd.DataFrame({'A':[1,-2,3,4,5],'B':[1,2,3,4,-5],'C':[1,2,3,-4,5],'value':[123,1567,1456,764,2456]})
print(df)
is,
A B C value
0 1 1 1 123
1 -2 2 2 1567
2 3 3 3 1456
3 4 4 -4 764
4 5 -5 5 2456
I know that there is only one negative value per row. I want to know in which column such a value is. I know I can do it like this,
desiredOutput = []
for i,row in df.iterrows():
if any(row < 0):
print(row[row < 0].index.to_numpy()[0],row[row[row < 0].index.to_numpy()[0]],row.value)
desiredOutput.append(row[row < 0].index.to_numpy()[0])
else:
desiredOutput.append(None)
print(desiredOutput)
gives,
A -2 1567
C -4 764
B -5 2456
[None, 'A', None, 'C', 'B']
But I imagine there must be a pythonnic way to do it (probably using .apply()?)
CodePudding user response:
You can use idxmax
on a boolean mask and where
to mask the cases where no value matches:
cols = ['A', 'B', 'C'] # optional: if needed to select columns
m = df[cols].lt(0) # else: df.lt(0)
df['result'] = m.idxmax(axis=1).where(m.any(1))
output:
A B C value result
0 1 1 1 123 NaN
1 -2 2 2 1567 A
2 3 3 3 1456 NaN
3 4 4 -4 764 C
4 5 -5 5 2456 B
Alternative as a one-liner for all numerical columns (requires python ≥3.8):
df['result'] = ((m:=df.select_dtypes('number').lt(0))
.idxmax(axis=1).where(m.any(1))
)
CodePudding user response:
Use DataFrame.dot
for matrix multiplication:
desiredOutput = df.lt(0).dot(df.columns).mask(lambda x: x.eq(''), None)
print (desiredOutput)
0 None
1 A
2 None
3 C
4 B
dtype: object
desiredOutput = df.lt(0).dot(df.columns).replace('',np.nan)
print (desiredOutput)
0 NaN
1 A
2 NaN
3 C
4 B
dtype: object
With apply it is possible, but slowier if filter columns by mask per rows:
desiredOutput = df.lt(0).apply(lambda x: next(iter(x.index[x]), None), axis=1)
print (desiredOutput)
0 None
1 A
2 None
3 C
4 B
dtype: object