Home > Software engineering >  The dataframe index column not getting dropped
The dataframe index column not getting dropped

Time:11-14

I am trying to convert a CSV into a dataframe and also updating column values in the CSV. But the issue I am facing is I am not getting rid of the index column as a result I am getting an extra index column without name in the console as follows.

    fsym_id factset_entity_id  ... is_substituted is_current
0  VVG1JM-S          ABCXYZ-Z  ...           True      False

If you see below, there is no column name for 0. If I try dropping first column in the dataframe using the following code line, df.drop(columns = df.columns[0], axis = 1, inplace= True), it drops the fsym_id column which I need. Below is the code.

def update_run_id_in_csv(rds_db_conn,test_case_name,file_name):
    df = pd.read_csv("{}/output/Float_Ingestion_Expected_Output_files/{}/{}.csv".format(str(parentDir), test_case_name, file_name))
    
    df['run_id'] = '2323323232999'
        #get_run_id(rds_db_conn,max_13f_query,query)
    
    df.drop(columns = df.columns[0], axis = 1, inplace= True)
    
    print(df)

There is no index column in the CSV. I am not able to understand how it gets added while updating the column run id in the data frame. How to get rid of the index column?

CodePudding user response:

As far as I know, you cannot get rid of index in the pandas dataframe. (Index is not considered as column)

However, when you convert dataframe into csv, you can skip indicies like below.

df.to_csv(path, index = False)

CodePudding user response:

You can try this because while you read the csv column unnamed 0 contains your previous index.

df = pd.read_csv('file_name.csv).drop(['unnamed 0'],axis=1) 
or 
   
df.drop(['unnamed 0'], axis = 1, inplace= True)

It's better to do this is specify pd.read_csv(..., index_col=[0], and avoid the extra "drop" call.

  • Related