Home > Blockchain >  Find first occurrence in a row in data frame
Find first occurrence in a row in data frame

Time:06-07

I have small dataset defined as follows;

df = pd.DataFrame({
'id':['id1', 'id2', 'id3'],
 'cmp1':[1,0,0],
 'cmp2':[0,1,0],
  'cmp3':[1,0,0]})

The goal is to add a new column called col which'll tell the first occurrence of 1 for every row among the columns, cmp1,cmp2,cmp3. Hence the final output shd look like;

df2 = pd.DataFrame({
'id':['id1', 'id2', 'id3'],
 'cmp1':[1,0,0],
 'cmp2':[0,1,0],
  'cmp3':[1,0,0],
  'col':['first', 'second','none']})

As you can see for id=id1, 1 occurred for the very first time in cmp1, hence we have first in col.It is irrespective of the fact that we have cmp2=0, and again cmp3=1. Can I get some help on how to define this new column, col?

CodePudding user response:

You can use .max(axis=1) to find those rows with at least one 1, then np.where with idxmax (for the column name):

# the actual data columns
data = df.iloc[:, 1:]

df['col'] = np.where(data.max(axis=1), data.idxmax(axis=1), 'none')

Output:

    id  cmp1  cmp2  cmp3   col
0  id1     1     0     1  cmp1
1  id2     0     1     0  cmp2
2  id3     0     0     0  none

If you insist on first, second,... you would need to do manually

labels = np.array(['first', 'second', 'third'])
df['col'] = np.where(data.max(axis=1), labels[data.to_numpy().argmax(axis=1)], 'none')

and output:

    id  cmp1  cmp2  cmp3     col
0  id1     1     0     1   first
1  id2     0     1     0  second
2  id3     0     0     0    none

CodePudding user response:

Use idxmax() with axis=1

cmp = df.filter(like='cmp')
df.loc[cmp.eq(1).any(axis=1), 'col'] = cmp.eq(1).idxmax(axis=1)

Output:

>>> df
    id  cmp1  cmp2  cmp3   col
0  id1     1     0     1  cmp1
1  id2     0     1     0  cmp2
2  id3     0     0     0   NaN

CodePudding user response:

You can use a combination of idxmax() and a mask to find the answer

df['first'] = df.drop(['id'], axis = 1).mask(df.eq(0)).idxmax(axis = 1).fillna('None')
  • Related