Home > Mobile >  pandas - Sorting Columns in a custom order
pandas - Sorting Columns in a custom order

Time:10-14

I have a Pandas Dataframe that get generated every day and the list of columns present in the Dataframe can vary when it gets generated each time.

I am trying to see if I can sort the sequence in which the columns are stored as the final output of the Dataframe in a specific format. If new columns are present they are placed towards the end.

Given below is how I am trying to build this final output

expected_columns = ['cust_id','cost_id','sale_id','prod_id']

Sample Dataframe columns:

['customer_name','cust_id','sale_id','sale_time']

I would like the above Dataframe structured as below:

['cust_id','sale_id','customer_name','sale_time']

Basically the columns in expected_columns takes the first priority and then place the other columns in the Dataframe as successive column of the new Dataframe.

CodePudding user response:

First idea is use list comprehension and join lists by :

expected_columns = ['cust_id','cost_id','sale_id','prod_id']

df = pd.DataFrame(columns=['customer_name','cust_id','sale_id','sale_time'])

expected_columns = ['cust_id','cost_id','sale_id','prod_id']
new1 = [c for c in df.columns if c in expected_columns]
new2 = [c for c in df.columns if c not in expected_columns]

new = new1   new2
print (new)
['cust_id','sale_id','customer_name','sale_time']

Or using Index.intersection with Index.difference:

expected_columns = ['cust_id','cost_id','sale_id','prod_id']

new = (df.columns.intersection(expected_columns, sort=False).tolist()   
       df.columns.difference(expected_columns, sort=False).tolist())

If also ordering in ouput by expected_columns is important use:

new = (pd.Index(expected_columns).intersection(df.columns, sort=False).tolist()  
       df.columns.difference(expected_columns, sort=False).tolist())

Difference is changed sample data:

expected_columns = ['sale_id','cost_id','cust_id','prod_id']


df = pd.DataFrame(columns=['customer_name','cust_id','sale_id','sale_time'])


new = (pd.Index(expected_columns).intersection(df.columns, sort=False).tolist()  
       df.columns.difference(expected_columns, sort=False).tolist())
print (new)
['sale_id', 'cust_id', 'customer_name', 'sale_time']


new = (df.columns.intersection(expected_columns, sort=False).tolist()   
       df.columns.difference(expected_columns, sort=False).tolist())
print (new)
['cust_id', 'sale_id', 'customer_name', 'sale_time']

Last for change order of columns use:

df = df[new]
  • Related