Create 2534 new columns in a dataframe based on the median values in the existing 2534 columns-CodePudding

I have a dataframe (df) similar to the one mentioned below:

PATID	Gene1	Gene2	Gene3
1001	5.899	67.87	9.87
1002	6.899	60.87	10.87
1003	7.899	63.87	8.87

I need a dataframe as mentioned below by looping through the code

Gene1_med = df['Gene1'].map(lambda exp: 'low' if exp <= df['Gene1'].median() else 'high')

PATID	Gene1	Gene1_med	Gene2	Gene2_med	Gene3	Gene3_med
1001	5.899	low	67.87	high	9.87	high
1002	6.899	low	60.87	low	10.87	high
1003	7.899	high	63.87	low	8.87	low

I am new to python and any help is highly appreciated. Thanks in advance!

CodePudding user response：

Use numpy.where with selecting columns of Genes, compare by DataFrame.le and assign changed columns names, last sorting columns:

cols = ['Gene1','Gene2','Gene3']

#columns PATID to index
df = df.set_index('PATID')

df[pd.Index(cols)   '_med'] = np.where(df[cols].le(df[cols].median()), 'low','high')

df1 = df.sort_index(axis=1).reset_index()

Or select all columns with substring Gene by DataFrame.filter, last sorting all columns without first and append to first column:

df1 = df.filter(like='Gene')

df[df1.columns   '_med'] = np.where(df1.le(df1.median()), 'low','high')

df1 = df.iloc[:, :1].join(df.iloc[:, 1:].sort_index(axis=1))

Or use list for columns names:

cols = ['Gene1','Gene2','Gene3']

df[pd.Index(cols)   '_med'] = np.where(df[cols].le(df[cols].median()), 'low','high')

df1 = df.iloc[:, :1].join(df.iloc[:, 1:].sort_index(axis=1))

print (df1)

   PATID  Gene1 Gene1_med  Gene2 Gene2_med  Gene3 Gene3_med
0   1001  5.899       low  67.87      high   9.87       low
1   1002  6.899       low  60.87       low  10.87      high
2   1003  7.899      high  63.87       low   8.87       low