I want to merge raw_clinical_patient
and raw_clinical_sample
dataframes.
However, the SAMPLE_ID
column in raw_clinical_sample
should be relabeled as PATIENT_ID
before the merge (because it was wrongly labelled). I used pandas' rename
function but it did not change the SAMPLE_ID
to PATIENT_ID
I want to merge by the new PATIENT_ID
column of the two dataframes.
import pandas as pd
# Clinical patient info
raw_clinical_patient = pd.read_csv("./gbm_tcga/data_clinical_patient.txt", sep="\t", header=4).drop(labels="OTHER_PATIENT_ID", axis=1).set_index("PATIENT_ID")
raw_clinical_patient = raw_clinical_patient.sort_index()
# Clinical sample info
raw_clinical_sample = pd.read_csv("./gbm_tcga/data_clinical_sample.txt", sep="\t", header=4).set_index("SAMPLE_ID").drop(labels=["PATIENT_ID", "OTHER_SAMPLE_ID"], axis=1)
raw_clinical_sample = raw_clinical_sample.sort_index()
raw_clinical_sample.rename(columns={'SAMPLE_ID':'PATIENT_ID'}, inplace=True)
# Merge both dataframes
raw_clin = raw_clinical_patient.join(raw_clinical_sample, on="PATIENT_ID", lsuffix="_left")
raw_clin
CodePudding user response:
You set SAMPLE_ID
as index, so there is no column with that name to change. If you want to change that index name you can go with raw_clinical_sample.rename_axis(index='PATIENT_ID', inplace=True)
btw you don't need to change it because you join on index. By default join
joins index-on-index
, just skip the on
.
Change
raw_clin = raw_clinical_patient.join(raw_clinical_sample, on="PATIENT_ID", lsuffix="_left")
to
raw_clin = raw_clinical_patient.join(raw_clinical_sample, lsuffix="_left")