pandas - Sorting Columns in a custom order-CodePudding

I have a Pandas Dataframe that get generated every day and the list of columns present in the Dataframe can vary when it gets generated each time.

I am trying to see if I can sort the sequence in which the columns are stored as the final output of the Dataframe in a specific format. If new columns are present they are placed towards the end.

Given below is how I am trying to build this final output

expected_columns = ['cust_id','cost_id','sale_id','prod_id']

Sample Dataframe columns:

['customer_name','cust_id','sale_id','sale_time']

I would like the above Dataframe structured as below:

['cust_id','sale_id','customer_name','sale_time']

Basically the columns in expected_columns takes the first priority and then place the other columns in the Dataframe as successive column of the new Dataframe.

CodePudding user response：

First idea is use list comprehension and join lists by :

expected_columns = ['cust_id','cost_id','sale_id','prod_id']

df = pd.DataFrame(columns=['customer_name','cust_id','sale_id','sale_time'])

expected_columns = ['cust_id','cost_id','sale_id','prod_id']
new1 = [c for c in df.columns if c in expected_columns]
new2 = [c for c in df.columns if c not in expected_columns]

new = new1   new2
print (new)
['cust_id','sale_id','customer_name','sale_time']

Or using Index.intersection with Index.difference:

expected_columns = ['cust_id','cost_id','sale_id','prod_id']

new = (df.columns.intersection(expected_columns, sort=False).tolist()   
       df.columns.difference(expected_columns, sort=False).tolist())

If also ordering in ouput by expected_columns is important use:

new = (pd.Index(expected_columns).intersection(df.columns, sort=False).tolist()  
       df.columns.difference(expected_columns, sort=False).tolist())

Difference is changed sample data:

expected_columns = ['sale_id','cost_id','cust_id','prod_id']


df = pd.DataFrame(columns=['customer_name','cust_id','sale_id','sale_time'])


new = (pd.Index(expected_columns).intersection(df.columns, sort=False).tolist()  
       df.columns.difference(expected_columns, sort=False).tolist())
print (new)
['sale_id', 'cust_id', 'customer_name', 'sale_time']


new = (df.columns.intersection(expected_columns, sort=False).tolist()   
       df.columns.difference(expected_columns, sort=False).tolist())
print (new)
['cust_id', 'sale_id', 'customer_name', 'sale_time']

Last for change order of columns use:

df = df[new]