Home > OS >  How do I rename pandas dataframe column?
How do I rename pandas dataframe column?

Time:05-05

I want to merge raw_clinical_patient and raw_clinical_sample dataframes.

However, the SAMPLE_ID column in raw_clinical_sample should be relabeled as PATIENT_ID before the merge (because it was wrongly labelled). I used pandas' rename function but it did not change the SAMPLE_ID to PATIENT_ID

I want to merge by the new PATIENT_ID column of the two dataframes.

import pandas as pd

    # Clinical patient info
    raw_clinical_patient = pd.read_csv("./gbm_tcga/data_clinical_patient.txt", sep="\t", header=4).drop(labels="OTHER_PATIENT_ID", axis=1).set_index("PATIENT_ID")
    raw_clinical_patient = raw_clinical_patient.sort_index()
    
    # Clinical sample info
    raw_clinical_sample = pd.read_csv("./gbm_tcga/data_clinical_sample.txt", sep="\t", header=4).set_index("SAMPLE_ID").drop(labels=["PATIENT_ID", "OTHER_SAMPLE_ID"], axis=1)
    raw_clinical_sample = raw_clinical_sample.sort_index()
    raw_clinical_sample.rename(columns={'SAMPLE_ID':'PATIENT_ID'}, inplace=True)
    
    # Merge both dataframes
    raw_clin = raw_clinical_patient.join(raw_clinical_sample, on="PATIENT_ID", lsuffix="_left")
    raw_clin 

CodePudding user response:

You set SAMPLE_ID as index, so there is no column with that name to change. If you want to change that index name you can go with raw_clinical_sample.rename_axis(index='PATIENT_ID', inplace=True)

btw you don't need to change it because you join on index. By default join joins index-on-index, just skip the on.

Change

raw_clin = raw_clinical_patient.join(raw_clinical_sample, on="PATIENT_ID", lsuffix="_left")

to

raw_clin = raw_clinical_patient.join(raw_clinical_sample, lsuffix="_left")
  • Related