Home > Net >  how to check occurance of string across two or more columns for each row and assign the final column
how to check occurance of string across two or more columns for each row and assign the final column

Time:02-22

id.       datcol1       datacol2  datacol-n       final col(to be created in output)
                                                                 
1           false       true         true          0
2           false        false       false             2
3           true         true        true          0
4           true        false        false            1

there are multiple columns say 13, So the job is to take each row id across all the column and check if the columns have atleast or equalto two "true" strings then assign 0 ; and if one "true "string then assign 1, if no "true" at all then assign 2

CodePudding user response:

Considering df to be:

In [1542]: df
Out[1542]: 
   id.  datcol1  datacol2  datacol-n
0    1    False      True       True
1    2    False     False      False
2    3     True      True       True
3    4     True     False      False

Use numpy.select, df.filter, Series.ge and df.sum:

In [1546]: import numpy as np

In [1547]: x = df.filter(like='dat').sum(1)

In [1548]: conds = [x.ge(2), x.eq(1), x.eq(0)]

In [1549]: choices = [0, 1, 2]

In [1553]: df['flag'] = np.select(conds, choices)

In [1554]: df
Out[1554]: 
   id.  datcol1  datacol2  datacol-n  flag
0    1    False      True       True     0
1    2    False     False      False     2
2    3     True      True       True     0
3    4     True     False      False     1
  • Related