i have data that looks like this:
is severe sn_id
1 1
0 1
1 2
1 2
what I want to do is to create a column in the data that will say YES or No according to the following "rule": for each group of snp_id if i have at least one "1" and at lest one "zero" in the is_sever column it will return yes else NO ( the is_sever and snp_id have str type of data example:
is severe sn_id yes\no
1 1 yes
0 1 yes
1 2 no
1 2 no
no
what i tried to do is this:
df['Value2']=np.where(df.groupby(df["snp_id"])['is_severe'].transform(lambda x: ((x=='1' ) & (x=="0")).any()), 'YES','NO')
but all the values were "no" is there any way to fix this ? thank you
CodePudding user response:
You can compare set
s:
m = df.groupby("snp_id")['is severe'].transform(lambda x: set(x) >= set([0,1]))
#if values are strings
m = df.groupby("snp_id")['is severe'].transform(lambda x: set(x) >= set(['0','1']))
df['Value2']=np.where(m, 'YES','NO')
If only 0, 1
values in is_severe
you can compare number of unique values by DataFrameGroupBy.nunique
:
m = df.groupby("snp_id")['is severe'].transform('nunique').eq(2)
df['Value2']=np.where(m, 'YES','NO')
Your solution should be changed:
m = df.groupby("snp_id")['is severe'].transform(lambda x:((x==1).any() and (x==0).any()))
#if values are strings
m=df.groupby("snp_id")['is severe'].transform(lambda x:((x=='1').any() and (x=='0').any()))
df['Value2']=np.where(m, 'YES','NO')