Home > Back-end >  Create 2534 new columns in a dataframe based on the median values in the existing 2534 columns
Create 2534 new columns in a dataframe based on the median values in the existing 2534 columns

Time:10-13

I have a dataframe (df) similar to the one mentioned below:

PATID Gene1 Gene2 Gene3
1001 5.899 67.87 9.87
1002 6.899 60.87 10.87
1003 7.899 63.87 8.87

I need a dataframe as mentioned below by looping through the code

Gene1_med = df['Gene1'].map(lambda exp: 'low' if exp <= df['Gene1'].median() else 'high')

PATID Gene1 Gene1_med Gene2 Gene2_med Gene3 Gene3_med
1001 5.899 low 67.87 high 9.87 high
1002 6.899 low 60.87 low 10.87 high
1003 7.899 high 63.87 low 8.87 low

I am new to python and any help is highly appreciated. Thanks in advance!

CodePudding user response:

Use numpy.where with selecting columns of Genes, compare by DataFrame.le and assign changed columns names, last sorting columns:

cols = ['Gene1','Gene2','Gene3']

#columns PATID to index
df = df.set_index('PATID')

df[pd.Index(cols)   '_med'] = np.where(df[cols].le(df[cols].median()), 'low','high')

df1 = df.sort_index(axis=1).reset_index()

Or select all columns with substring Gene by DataFrame.filter, last sorting all columns without first and append to first column:

df1 = df.filter(like='Gene')

df[df1.columns   '_med'] = np.where(df1.le(df1.median()), 'low','high')

df1 = df.iloc[:, :1].join(df.iloc[:, 1:].sort_index(axis=1))

Or use list for columns names:

cols = ['Gene1','Gene2','Gene3']

df[pd.Index(cols)   '_med'] = np.where(df[cols].le(df[cols].median()), 'low','high')

df1 = df.iloc[:, :1].join(df.iloc[:, 1:].sort_index(axis=1))

print (df1)

   PATID  Gene1 Gene1_med  Gene2 Gene2_med  Gene3 Gene3_med
0   1001  5.899       low  67.87      high   9.87       low
1   1002  6.899       low  60.87       low  10.87      high
2   1003  7.899      high  63.87       low   8.87       low
  • Related