Home > Back-end >  Pandas create a DataFrame with columns in the correct order
Pandas create a DataFrame with columns in the correct order

Time:10-31

I'm trying to create a DataFrame with the correct order by passing my columns to the constructor:

df = pd.DataFrame(columns={
    'seg1_count', 'seg1_mean', 'seg1_std', 'seg1_min', 'seg1_25%', 'seg1_50%',
    'seg1_75%', 'seg1_max',
    'seg2_count', 'seg2_mean', 'seg2_std', 'seg2_min', 'seg2_25%', 'seg2_50%',
    'seg2_75%', 'seg2_max',
    'seg3_count', 'seg3_mean', 'seg3_std', 'seg3_min', 'seg3_25%', 'seg3_50%',
    'seg3_75%', 'seg3_max',
    'seg4_count', 'seg4_mean', 'seg4_std', 'seg4_min', 'seg4_25%', 'seg4_50%',
    'seg4_75%', 'seg4_max'
})

But the columns appear out of order df.columns:

Index(['seg4_min', 'seg1_max', 'seg3_std', 'seg3_max', 'seg1_std',
       'seg2_count', 'seg1_25%', 'seg3_75%', 'seg2_mean', 'seg2_50%',
       'seg4_count', 'seg3_50%', 'seg1_50%', 'seg2_min', 'seg1_count',
       'seg2_max', 'seg2_75%', 'seg4_25%', 'seg2_25%', 'seg1_min', 'seg4_50%',
       'seg1_mean', 'seg3_count', 'seg4_mean', 'seg4_max', 'seg3_mean',
       'seg3_25%', 'seg3_min', 'seg4_std', 'seg1_75%', 'seg4_75%', 'seg2_std'],
      dtype='object')

What's wrong with my code?

CodePudding user response:

It's because you are passing the column names as a set and sets are unordered. Change it to a list and you should have your order preserved:

df = pd.DataFrame(columns = ['seg1_count', 'seg1_mean', 'seg1_std', 'seg1_min', 'seg1_25%', 'seg1_50%', 'seg1_75%', 'seg1_max',
              'seg2_count', 'seg2_mean', 'seg2_std', 'seg2_min', 'seg2_25%', 'seg2_50%', 'seg2_75%', 'seg2_max',
              'seg3_count', 'seg3_mean', 'seg3_std', 'seg3_min', 'seg3_25%', 'seg3_50%', 'seg3_75%', 'seg3_max',
              'seg4_count', 'seg4_mean', 'seg4_std', 'seg4_min', 'seg4_25%', 'seg4_50%', 'seg4_75%', 'seg4_max'])

More specifically, it's not that the DataFrame creation that is not preserving order, but rather when you create the set the order is lost:

columns_set = {'seg1_count', 'seg1_mean', 'seg1_std', 'seg1_min', 'seg1_25%', 'seg1_50%', 'seg1_75%', 'seg1_max',
              'seg2_count', 'seg2_mean', 'seg2_std', 'seg2_min', 'seg2_25%', 'seg2_50%', 'seg2_75%', 'seg2_max',
              'seg3_count', 'seg3_mean', 'seg3_std', 'seg3_min', 'seg3_25%', 'seg3_50%', 'seg3_75%', 'seg3_max',
              'seg4_count', 'seg4_mean', 'seg4_std', 'seg4_min', 'seg4_25%', 'seg4_50%', 'seg4_75%', 'seg4_max'}
print(columns_set)

{'seg1_50%', 'seg2_count', 'seg4_25%', 'seg3_count', 'seg4_max', 'seg2_25%', 'seg3_min', 'seg4_count', 'seg2_std', 'seg4_75%', 'seg3_std', 'seg1_mean', 'seg2_50%', 'seg3_25%', 'seg1_75%', 'seg3_mean', 'seg1_max', 'seg3_75%', 'seg2_max', 'seg1_min', 'seg3_max', 'seg4_50%', 'seg2_75%', 'seg2_min', 'seg1_count', 'seg4_mean', 'seg3_50%', 'seg1_std', 'seg4_min', 'seg1_25%', 'seg2_mean', 'seg4_std'}

columns_list = ['seg1_count', 'seg1_mean', 'seg1_std', 'seg1_min', 'seg1_25%', 'seg1_50%', 'seg1_75%', 'seg1_max',
              'seg2_count', 'seg2_mean', 'seg2_std', 'seg2_min', 'seg2_25%', 'seg2_50%', 'seg2_75%', 'seg2_max',
              'seg3_count', 'seg3_mean', 'seg3_std', 'seg3_min', 'seg3_25%', 'seg3_50%', 'seg3_75%', 'seg3_max',
              'seg4_count', 'seg4_mean', 'seg4_std', 'seg4_min', 'seg4_25%', 'seg4_50%', 'seg4_75%', 'seg4_max']
print(columns_list)

['seg1_count', 'seg1_mean', 'seg1_std', 'seg1_min', 'seg1_25%', 'seg1_50%', 'seg1_75%', 'seg1_max', 'seg2_count', 'seg2_mean', 'seg2_std', 'seg2_min', 'seg2_25%', 'seg2_50%', 'seg2_75%', 'seg2_max', 'seg3_count', 'seg3_mean', 'seg3_std', 'seg3_min', 'seg3_25%', 'seg3_50%', 'seg3_75%', 'seg3_max', 'seg4_count', 'seg4_mean', 'seg4_std', 'seg4_min', 'seg4_25%', 'seg4_50%', 'seg4_75%', 'seg4_max']
  • Related