The rows of clin.index
(row length = 81) is a subset of the columns of common_mrna
(col length = 151). I want to keep the columns of common_mrna
only if the column names match to the row values of clin
dataframe.
My code failed to reduce the number of columns in common_mrna
to 81.
import pandas as pd
common_mrna = common_mrna.set_index("Hugo_Symbol")
mrna_val = {}
for colnames, val in common_mrna.iteritems():
for i, rows in clin.iterrows():
if [[common_mrna.columns == i] == "TRUE"]:
mrna_val = np.append(mrna_val, val)
mrna = np.concatenate(mrna_val, axis=0)
common_mrna
Hugo_Symbol | A | B | C | D |
---|---|---|---|---|
First | 1 | 2 | 3 | 4 |
Second | 5 | row | 6 | 7 |
clin
Another header | |
---|---|
A | 20 |
D | 30 |
desired output
Hugo_Symbol | A | D |
---|---|---|
First | 1 | 4 |
Second | 5 | 7 |
CodePudding user response:
Try this using reindex
:
common_mrna.reindex(clin.index, axis=1)
Output:
A D
First 1 4
Second 5 7
Update, IIUC:
common_mrna.set_index('Hugo_Symbol').reindex(clin.index, axis=1).reset_index()
CodePudding user response:
IUUC, you can select the rows of A header
in clin
found in common_mrna
columns and add the first column of common_mrna
cols = clin.loc[clin.index.isin(common_mrna.columns)].index.tolist()
# or with set
cols = list(sorted(set(clin.index.tolist()) & set(common_mrna.columns), key=common_mrna.columns.tolist().index))
out = common_mrna[['Hugo_Symbol'] cols]
print(out)
Hugo_Symbol A D
0 First 1 4
1 Second 5 7