How to iterate through column names and and do the same procedure on each column in Pandas-CodePudding

I currently have two large dataframes which I will condense for the purpose of this questions. Dataframe 1 has a list of probesets and transcripts. I must match the transcripts to the corresponding transcripts in dataframe 2 and get the data for each subjet as you can see below:

       Probeset       Transcript
0    1554784_at  ENST00000547702
1           NaN  ENST00000547849
2     212983_at  ENST00000311189
3           NaN  ENST00000397596
4  1566643_a_at  ENST00000587894

and then the following dataframe where I need to match the transcripts:

     transcript_id  phchp230v2  phchp273v3  phchp367v3  phchp201v2
0  ENST00000547702    0.000000    0.000000    0.000000    0.000000
1  ENST00000547849    0.000000    0.000000    0.000000    0.000000
2  ENST00000311189    0.336418    0.044721    0.155847    1.676620
3  ENST00000397596    0.027106    0.016806    0.014509    0.022015
4  ENST00000587894    0.048200    0.089618    0.046528    0.000000

What I need to do is match the transcripts that are in dataframe 1 with the transcripts in dataframe 2 and get the data that is in each transcript for that specific subject that is at the top of dataframe 2. However, there is a lot of data in each of these so I would have to search for the transcripts and the corresponding data for that transcript as they are in just in order how I showcased. The expected output is as shows:

       Probeset       Transcript  phchp230v2  phchp273v3  phchp367v3  phchp201v2
0    1554784_at  ENST00000547702    0.000000    0.000000    0.000000    0.000000
1           NaN  ENST00000547849    0.000000    0.000000    0.000000    0.000000
2     212983_at  ENST00000311189    0.336418    0.044721    0.155847    1.676620
3           NaN  ENST00000397596    0.027106    0.016806    0.014509    0.022015
4  1566643_a_at  ENST00000587894    0.048200    0.089618    0.046528    0.000000

I'm not sure how to go about finding the transcripts and then placing the specific data found with the correct subject headers as well, thank you all in advance!

CodePudding user response：

You can merge them by .merge(), as follows:

(Assuming the first/second dataframes are called df1/df2 respectively)

df_out = df1.merge(df2.rename({'transcript_id': 'Transcript'}, axis=1), on='Transcript', how='left')

Result:

print(df_out)


       Probeset       Transcript  phchp230v2  phchp273v3  phchp367v3  phchp201v2
0    1554784_at  ENST00000547702    0.000000    0.000000    0.000000    0.000000
1           NaN  ENST00000547849    0.000000    0.000000    0.000000    0.000000
2     212983_at  ENST00000311189    0.336418    0.044721    0.155847    1.676620
3           NaN  ENST00000397596    0.027106    0.016806    0.014509    0.022015
4  1566643_a_at  ENST00000587894    0.048200    0.089618    0.046528    0.000000