Iterate through rows and identify which columns is true, assign new column the name of the column.he-CodePudding

I have the following DataFrame:

Index	Time Lost	Cause 1	Cause 2	Cause 3
0	40	x	Nan	Nan
1	15	Nan	x	Nan
2	65	x	Nan	Nan
3	10	Nan	Nan	x

There is only one "X" per row which identifies the cause of the time lost column. I am trying to iterate through each row (and column) to determine which column holds the "X". I would then like to add a "Type" column with the name of the column header that was True for each row. This is what I would like as a result:

Index	Time Lost	Cause 1	Cause 2	Cause 3	Type
0	40	x	Nan	Nan	Cause 1
1	15	Nan	x	Nan	Cause 2
2	65	x	Nan	Nan	Cause 1
3	10	Nan	Nan	x	Cause 3

Currently my code looks like this, I am trying to iterate through the DataFrame. However, I'm not sure if there is a function or non-iterative approach to assign the proper value to the "Type" column:

cols = ['Cause1', 'Cause 2', 'Cause 3']

for index, row in df.iterrows():
            for col in cols:
                if df.loc[index,col] =='X':
                    df.loc[index,'Type'] = col
                    continue
                else:
                    df.loc[index,'Type'] = 'Other'

                continue

The issue I get with this code is that it iterates but only identifies rows with the last item in the cols list and the remainder go to "Other".

Any help is appreciated!

CodePudding user response：

You could use idxmax on the boolean array of your data:

df['Type'] = df.drop('Time Lost', axis=1).eq('x').idxmax(axis=1)

Note that this only report the first cause if several

output:

   Time Lost Cause 1 Cause 2 Cause 3     Type
0         40       x     Nan     Nan  Cause 1
1         15     Nan       x     Nan  Cause 2
2         65       x     Nan     Nan  Cause 1
3         10     Nan     Nan       x  Cause 3