Home > front end >  How do I subset the columns of a dataframe based on the index of another dataframe?
How do I subset the columns of a dataframe based on the index of another dataframe?

Time:05-13

The rows of clin.index (row length = 81) is a subset of the columns of common_mrna (col length = 151). I want to keep the columns of common_mrna only if the column names match to the row values of clin dataframe.

My code failed to reduce the number of columns in common_mrna to 81.

import pandas as pd

common_mrna = common_mrna.set_index("Hugo_Symbol")
mrna_val = {}
for colnames, val in common_mrna.iteritems():
  for i, rows in clin.iterrows():
    if [[common_mrna.columns == i] == "TRUE"]:
      mrna_val = np.append(mrna_val, val)

mrna = np.concatenate(mrna_val, axis=0)

common_mrna

Hugo_Symbol A B C D
First 1 2 3 4
Second 5 row 6 7

clin

Another header
A 20
D 30

desired output

Hugo_Symbol A D
First 1 4
Second 5 7

CodePudding user response:

Try this using reindex:

common_mrna.reindex(clin.index, axis=1)

Output:

        A  D
First   1  4
Second  5  7

Update, IIUC:

common_mrna.set_index('Hugo_Symbol').reindex(clin.index, axis=1).reset_index()

CodePudding user response:

IUUC, you can select the rows of A header in clin found in common_mrna columns and add the first column of common_mrna

cols = clin.loc[clin.index.isin(common_mrna.columns)].index.tolist()
# or with set
cols = list(sorted(set(clin.index.tolist()) & set(common_mrna.columns), key=common_mrna.columns.tolist().index))

out = common_mrna[['Hugo_Symbol']   cols]
print(out)

  Hugo_Symbol  A  D
0       First  1  4
1      Second  5  7
  • Related