Home > Mobile >  Keeping columns in the original order when using UseCols in Pandas Read_excel
Keeping columns in the original order when using UseCols in Pandas Read_excel

Time:12-10

USECOLS in pandas.read_excel is sorting my list in ascending order of column value.

I have a list. This list is not constant and will vary across the code. indexList = [3,6,5,18,19]

Once I pass this list to pandas.read_excel, USECOLS is sorting my output dataframe in ascending order : [3,5,6,18,19]

I need this in same order as I passed it, ie, [3,6,5,18,19]

Indexlist = [3,6,5,18,19]

#inside a function to which Indexlist  is passed as a parameter

    excel_data_df = pandas.read_excel('MasterFiltered.xlsx',   
                                          sheet_name='Sheet1',                                       
                                          usecols=indexList, #indexList is a variable List                
                                          dtype={'school: str,
                                          'class': str,
                                          'name': str,
                                          'subject': str,
                                          'school': str}) # datatype -- line X
    df_filtered = excel_data_df.loc[excel_data_df["cityname"]==i_city_name]# Row Filtering
    print(df_filtered) # Output is sorted excel ascending order of columns - [3,5,6,18,19] 

Note: On looking up in Stack the answers mentions to use column names along with column index but that is not possible in my case, as columns will keep varying.

A very similar question/answer I found is : Keeping columns in the specified order when using UseCols in Pandas Read_CSV

But when I give [indexList] at end of the line X similar to as mentioned in above answer I get errors : raise KeyError(f"None of [{key}] are in the [{axis_name}]")

Kindly help

CodePudding user response:

Once you have read the file, use np.argsort to reorder your columns:

import pandas as pd
import numpy as np

Indexlist = [3, 6, 5, 18, 19]
df = pd.read_excel('data.xlsx', usecols=Indexlist)
print(df)

out = df.iloc[:, np.argsort(Indexlist)]
print(out)


# Output of df:
   D  F  G   S   T
0  3  5  6  18  19


# Output of out:
   D  G  F   S   T
0  3  6  5  18  19

CodePudding user response:

Try this:

Indexlist = [3,2]
col_names=pd.read_excel('data.xlsx', nrows=1).columns
df = pd.read_excel('data.xlsx', usecols=Indexlist)
df = df[col_names[Indexlist]]
  • Related