I have a simple df.
Genotype freq
0 HET 0/1
1 REF 0/1
2 HOM 0/1
3 HOM 1/1
I would like to change 'HOM' to 'REF' if 'freq' == '0/1' or '1/0'. I would not like to alter any 'HET' rows. I have attempted to do this based on other answers in stack but have had little success. My attempts have been pasted below.
df = {'Genotype': ['HET', 'REF', 'HOM', 'HOM'],
'freq': ['0/1', '0/1', '0/1', '1/1']
}
df = pd.DataFrame(df)
catch=['0/1', '1/0']
#attempt 1 - error: For argument "inplace" expected type bool, received type int.
df.where(df['Genotype'] != 'HET', df.loc[df.freq.isin(catch), 'Genotype'] == 'REF', 0)
#attempt 2 - Ignores HET but adds TRUE/FALSE to other rows - looks messy.
df['Genotype']=df['Genotype'].apply(lambda x: 'HET' if x =='HET' else df.loc[df.freq.isin(catch), 'Genotype'] == 'REF')
#attempt 3 - Converts all '0/1' to REF
for index, row in df.iterrows():
if row['Genotype'] == 'HOM':
df.loc[df.freq.isin(catch), 'Genotype'] = 'REF'
If possible, is there a simple way to perform this in python/pandas without creating a new object - the indexes are important inside the larger function I have. Cheers.
CodePudding user response:
You need chain both conditions by &
for bitwise AND
:
catch=['0/1', '1/0']
df.loc[df.freq.isin(catch) & df['Genotype'].ne('HET'), 'Genotype'] = 'REF'
print (df)
Genotype freq
0 HET 0/1
1 REF 0/1
2 REF 0/1
3 HOM 1/1