I have a dataframe similar to the following:
data = {'ref': ['0001', '0002', '0003', '0004', '0005', '0006', '0007'],
'topic': [-1, 0, 1, 2, 0, -1, 2],
'-1': [1, 0, 0, 0.1, 0, 0.99, 0],
'0': [0, 1, 0, 0, 1, 0, 0.1],
'1': [0, 0, 0.99, 0, 0, 0, 0],
'2': [0.4, 0, 0, 0.7, 0, 0, 1],
}
df = pd.DataFrame(data)
If the column topic
value is -1
, then I want to look in the same row of columns 0 to 2
, and change the value in topic
, to the header of the max value.
As an example, in the first row in the table above, the column topic
has a value of -1
, so I want to change that value to whatever is the header name of the max value in the row. So that would be column 2, which has a value of 0.4.
So now the table looks like this, where the topic
value of ref 0001
has changed from -1
to 2
The other point is, that if like ref 0006
, there is no value > 0
in the other columns despite having a topic value of -1, then it should be left alone.
I hope this makes sense. Struggling hard to get this done.
Thank you!
CodePudding user response:
def correct_topic(row):
max_value = row[['0','1','2']].max()
if max_value > 0:
return row[row == max_value].index[0]
else:
return row.topic
df['topic'] = df.apply(correct_topic, axis=1)
I think this is what you need :)
CodePudding user response:
Setting up example:
import pandas as pd
data = {'ref': ['0001', '0002', '0003', '0004', '0005', '0006', '0007'],
'topic': [-1, 0, 1, 2, 0, -1, 2],
'-1': [1, 0, 0, 0.1, 0, 0.99, 0],
'0': [0, 1, 0, 0, 1, 0, 0.1],
'1': [0, 0, 0.99, 0, 0, 0, 0],
'2': [0.4, 0, 0, 0.7, 0, 0, 1],
}
df = pd.DataFrame(data)
Making a boolean mask to set the rows:
cols = [str(x) for x in range(0, 3)]
condition = (df.topic == -1) & ((df[cols] > 0).any(axis=1))
0 True
1 False
2 False
3 False
4 False
5 False
6 False
dtype: bool
using idxmax gets the names of the columns at the maximum values
df[cols].idxmax(axis=1)
0 2
1 0
2 1
3 2
4 0
5 0
6 2
dtype: object
Now you can use df.where and the boolean condition to set the topic values.
df["topic"] = df["topic"].where(~condition, df[cols].idxmax(axis=1)[condition])
Output:
ref topic -1 0 1 2
0 0001 2 1.00 0.0 0.00 0.4
1 0002 0 0.00 1.0 0.00 0.0
2 0003 1 0.00 0.0 0.99 0.0
3 0004 2 0.10 0.0 0.00 0.7
4 0005 0 0.00 1.0 0.00 0.0
5 0006 -1 0.99 0.0 0.00 0.0
6 0007 2 0.00 0.1 0.00 1.0