How do I subset the columns of a dataframe based on the index of another dataframe?-CodePudding

The rows of clin.index (row length = 81) is a subset of the columns of common_mrna (col length = 151). I want to keep the columns of common_mrna only if the column names match to the row values of clin dataframe.

My code failed to reduce the number of columns in common_mrna to 81.

import pandas as pd

common_mrna = common_mrna.set_index("Hugo_Symbol")
mrna_val = {}
for colnames, val in common_mrna.iteritems():
  for i, rows in clin.iterrows():
    if [[common_mrna.columns == i] == "TRUE"]:
      mrna_val = np.append(mrna_val, val)

mrna = np.concatenate(mrna_val, axis=0)

common_mrna

Hugo_Symbol	A	B	C	D
First	1	2	3	4
Second	5	row	6	7

clin

	Another header
A	20
D	30

desired output

Hugo_Symbol	A	D
First	1	4
Second	5	7

CodePudding user response：

Try this using reindex:

common_mrna.reindex(clin.index, axis=1)

Output:

        A  D
First   1  4
Second  5  7

Update, IIUC:

common_mrna.set_index('Hugo_Symbol').reindex(clin.index, axis=1).reset_index()

CodePudding user response：

IUUC, you can select the rows of A header in clin found in common_mrna columns and add the first column of common_mrna

cols = clin.loc[clin.index.isin(common_mrna.columns)].index.tolist()
# or with set
cols = list(sorted(set(clin.index.tolist()) & set(common_mrna.columns), key=common_mrna.columns.tolist().index))

out = common_mrna[['Hugo_Symbol']   cols]

print(out)

  Hugo_Symbol  A  D
0       First  1  4
1      Second  5  7