find value in group in python-CodePudding

i have data that looks like this:

is severe      sn_id
1                1
0                1
1                2
1                2

what I want to do is to create a column in the data that will say YES or No according to the following "rule": for each group of snp_id if i have at least one "1" and at lest one "zero" in the is_sever column it will return yes else NO ( the is_sever and snp_id have str type of data example:

is severe      sn_id     yes\no
    1                1      yes
    0                1       yes
    1                2       no
    1                2       no

what i tried to do is this:

df['Value2']=np.where(df.groupby(df["snp_id"])['is_severe'].transform(lambda x: ((x=='1' ) & (x=="0")).any()), 'YES','NO')

but all the values were "no" is there any way to fix this ? thank you

CodePudding user response：

You can compare sets:

m = df.groupby("snp_id")['is severe'].transform(lambda x: set(x) >= set([0,1]))
#if values are strings
m = df.groupby("snp_id")['is severe'].transform(lambda x: set(x) >= set(['0','1']))
df['Value2']=np.where(m, 'YES','NO')

If only 0, 1 values in is_severe you can compare number of unique values by DataFrameGroupBy.nunique:

m = df.groupby("snp_id")['is severe'].transform('nunique').eq(2)
df['Value2']=np.where(m, 'YES','NO')

Your solution should be changed:

m = df.groupby("snp_id")['is severe'].transform(lambda x:((x==1).any() and (x==0).any()))
#if values are strings
m=df.groupby("snp_id")['is severe'].transform(lambda x:((x=='1').any() and (x=='0').any()))
df['Value2']=np.where(m, 'YES','NO')