I have the following DataFrame:
Index | Time Lost | Cause 1 | Cause 2 | Cause 3 |
---|---|---|---|---|
0 | 40 | x | Nan | Nan |
1 | 15 | Nan | x | Nan |
2 | 65 | x | Nan | Nan |
3 | 10 | Nan | Nan | x |
There is only one "X" per row which identifies the cause of the time lost column. I am trying to iterate through each row (and column) to determine which column holds the "X". I would then like to add a "Type" column with the name of the column header that was True for each row. This is what I would like as a result:
Index | Time Lost | Cause 1 | Cause 2 | Cause 3 | Type |
---|---|---|---|---|---|
0 | 40 | x | Nan | Nan | Cause 1 |
1 | 15 | Nan | x | Nan | Cause 2 |
2 | 65 | x | Nan | Nan | Cause 1 |
3 | 10 | Nan | Nan | x | Cause 3 |
Currently my code looks like this, I am trying to iterate through the DataFrame. However, I'm not sure if there is a function or non-iterative approach to assign the proper value to the "Type" column:
cols = ['Cause1', 'Cause 2', 'Cause 3']
for index, row in df.iterrows():
for col in cols:
if df.loc[index,col] =='X':
df.loc[index,'Type'] = col
continue
else:
df.loc[index,'Type'] = 'Other'
continue
The issue I get with this code is that it iterates but only identifies rows with the last item in the cols list and the remainder go to "Other".
Any help is appreciated!
CodePudding user response:
You could use idxmax
on the boolean array of your data:
df['Type'] = df.drop('Time Lost', axis=1).eq('x').idxmax(axis=1)
Note that this only report the first cause if several
output:
Time Lost Cause 1 Cause 2 Cause 3 Type
0 40 x Nan Nan Cause 1
1 15 Nan x Nan Cause 2
2 65 x Nan Nan Cause 1
3 10 Nan Nan x Cause 3