I have a dataframe (df) similar to the one mentioned below:
PATID | Gene1 | Gene2 | Gene3 |
---|---|---|---|
1001 | 5.899 | 67.87 | 9.87 |
1002 | 6.899 | 60.87 | 10.87 |
1003 | 7.899 | 63.87 | 8.87 |
I need a dataframe as mentioned below by looping through the code
Gene1_med = df['Gene1'].map(lambda exp: 'low' if exp <= df['Gene1'].median() else 'high')
PATID | Gene1 | Gene1_med | Gene2 | Gene2_med | Gene3 | Gene3_med |
---|---|---|---|---|---|---|
1001 | 5.899 | low | 67.87 | high | 9.87 | high |
1002 | 6.899 | low | 60.87 | low | 10.87 | high |
1003 | 7.899 | high | 63.87 | low | 8.87 | low |
I am new to python and any help is highly appreciated. Thanks in advance!
CodePudding user response:
Use numpy.where
with selecting columns of Gene
s, compare by DataFrame.le
and assign changed columns names, last sorting columns:
cols = ['Gene1','Gene2','Gene3']
#columns PATID to index
df = df.set_index('PATID')
df[pd.Index(cols) '_med'] = np.where(df[cols].le(df[cols].median()), 'low','high')
df1 = df.sort_index(axis=1).reset_index()
Or select all columns with substring Gene
by DataFrame.filter
, last sorting all columns without first and append to first column:
df1 = df.filter(like='Gene')
df[df1.columns '_med'] = np.where(df1.le(df1.median()), 'low','high')
df1 = df.iloc[:, :1].join(df.iloc[:, 1:].sort_index(axis=1))
Or use list for columns names:
cols = ['Gene1','Gene2','Gene3']
df[pd.Index(cols) '_med'] = np.where(df[cols].le(df[cols].median()), 'low','high')
df1 = df.iloc[:, :1].join(df.iloc[:, 1:].sort_index(axis=1))
print (df1)
PATID Gene1 Gene1_med Gene2 Gene2_med Gene3 Gene3_med
0 1001 5.899 low 67.87 high 9.87 low
1 1002 6.899 low 60.87 low 10.87 high
2 1003 7.899 high 63.87 low 8.87 low