I have a dataframe such as
Groups subgroups Species Value
G1 subg1 Sp1 NO
G1 subg1 Sp1 YES
G1 subg1 Sp2 NO
G2 subg5 Sp7 YES
G2 subg6 Sp7 NO
G2 subg7 Sp7 NO
G2 subg7 Sp7 YES
G3 subg1 Sp1 NO
G3 subg1 Sp1 NO
and I would like to fill a new Value2
column where within each Groups
, subgroups
and Species
with "YES" if there is at least one YES within this group, and NO is there is none.
I should then get :
Groups subgroups Species Value Value2
G1 subg1 Sp1 NO YES
G1 subg1 Sp1 YES YES
G1 subg1 Sp2 NO NO
G2 subg5 Sp7 YES YES
G2 subg6 Sp7 NO NO
G2 subg7 Sp7 NO YES
G2 subg7 Sp7 YES YES
G3 subg1 Sp1 NO NO
G3 subg1 Sp1 NO NO
CodePudding user response:
Use groupby
, transform
:
df['Value2'] = (df['Value']=="YES").groupby([df['Groups'],
df['subgroups'],
df['Species']])\
.transform('any')\
.map({True:"YES", False:"NO"})
print(df)
Output:
Groups subgroups Species Value Value2
0 G1 subg1 Sp1 NO YES
1 G1 subg1 Sp1 YES YES
2 G1 subg1 Sp2 NO NO
3 G2 subg5 Sp7 YES YES
4 G2 subg6 Sp7 NO NO
5 G2 subg7 Sp7 NO YES
6 G2 subg7 Sp7 YES YES
7 G3 subg1 Sp1 NO NO
8 G3 subg1 Sp1 NO NO
CodePudding user response:
use np.where, groupby, transform
df['Value2']=np.where(df.groupby(['Groups','subgroups','Species'])['Value'].transform(lambda x: (x=='YES').any()), 'YES','NO')
Groups subgroups Species Value
0 G1 subg1 Sp1 NO
1 G1 subg1 Sp1 YES
2 G1 subg1 Sp2 NO
3 G2 subg5 Sp7 YES
4 G2 subg6 Sp7 NO
5 G2 subg7 Sp7 NO
6 G2 subg7 Sp7 YES
7 G3 subg1 Sp1 NO
8 G3 subg1 Sp1 NO