Home > Mobile >  Pandas: Create dataframe based on specific columns from another dataframe
Pandas: Create dataframe based on specific columns from another dataframe

Time:04-06

I want to create the clinical dataframe with a sex column based on the Sex column in the raw_clinical_patient dataframe.

import pandas as pd

raw_clinical_patient = pd.read_csv("./gbm_tcga/data_clinical_patient.txt", sep="\t", header=4) # Skip first 4 rows

clinical = pd.DataFrame()
clinical["sex"] = raw_clinical_patient.loc[:,"Sex"]
clinical["last_fu"] = raw_clinical_patient.loc[:,"Last Alive Less Initial Pathologic Diagnosis Date Calculated Day Value"]

Traceback:

KeyError: 'Sex'

CodePudding user response:

It's case sensitive, so I think there probably is a sex column in your raw_clinical_patient data frame rather than a Sex column.

CodePudding user response:

You may simply write

clinical=raw_clinical_patient[["Sex","Last Alive Less Initial Pathologic Diagnosis Date Calculated Day Value"]]
clinical.columns=['sex','last_fu'] #rename accordingly
  • Related