df = pd.DataFrame({'col_a':[0,0,0,1,1,1], 'col_b':[1,0,0,1,0,1],'col_c':[1,0,0,1,0,1]})
df
col_a col_b col_c
0 0 1 1
1 0 0 0
2 0 0 0
3 1 1 1
4 1 0 0
5 1 1 1
i want to add a new feature to this df
,based on (presudocode)if numbers(1) in a row are majority in this row
,just like a voter
. i have tried for
on every column, but the orginal data`s rows are 10000, it takes about several mintutes( i think if use pandas
api, it would be faster). i have tried apply
or assign
, but it fails because of the unfamiliarity to the pandas
package.
i want to learn it using pandas api,thank you all
CodePudding user response:
You can use mode
:
df['col_d'] = df.mode(axis=1)
print(df)
# Output
col_a col_b col_c col_d
0 0 1 1 1
1 0 0 0 0
2 0 0 0 0
3 1 1 1 1
4 1 0 0 0
5 1 1 1 1
CodePudding user response:
You can sum on columns, if the result is greater than 1, it means 1 is majority
import numpy as np
df['feature'] = np.where(df.sum(axis=1).ge(2), '1 majority', '0 majority')
print(df)
col_a col_b col_c feature
0 0 1 1 1 majority
1 0 0 0 0 majority
2 0 0 0 0 majority
3 1 1 1 1 majority
4 1 0 0 0 majority
5 1 1 1 1 majority