I'm trying to create a DataFrame with the correct order by passing my columns to the constructor:
df = pd.DataFrame(columns={
'seg1_count', 'seg1_mean', 'seg1_std', 'seg1_min', 'seg1_25%', 'seg1_50%',
'seg1_75%', 'seg1_max',
'seg2_count', 'seg2_mean', 'seg2_std', 'seg2_min', 'seg2_25%', 'seg2_50%',
'seg2_75%', 'seg2_max',
'seg3_count', 'seg3_mean', 'seg3_std', 'seg3_min', 'seg3_25%', 'seg3_50%',
'seg3_75%', 'seg3_max',
'seg4_count', 'seg4_mean', 'seg4_std', 'seg4_min', 'seg4_25%', 'seg4_50%',
'seg4_75%', 'seg4_max'
})
But the columns appear out of order df.columns
:
Index(['seg4_min', 'seg1_max', 'seg3_std', 'seg3_max', 'seg1_std',
'seg2_count', 'seg1_25%', 'seg3_75%', 'seg2_mean', 'seg2_50%',
'seg4_count', 'seg3_50%', 'seg1_50%', 'seg2_min', 'seg1_count',
'seg2_max', 'seg2_75%', 'seg4_25%', 'seg2_25%', 'seg1_min', 'seg4_50%',
'seg1_mean', 'seg3_count', 'seg4_mean', 'seg4_max', 'seg3_mean',
'seg3_25%', 'seg3_min', 'seg4_std', 'seg1_75%', 'seg4_75%', 'seg2_std'],
dtype='object')
What's wrong with my code?
CodePudding user response:
It's because you are passing the column names as a set and sets are unordered. Change it to a list and you should have your order preserved:
df = pd.DataFrame(columns = ['seg1_count', 'seg1_mean', 'seg1_std', 'seg1_min', 'seg1_25%', 'seg1_50%', 'seg1_75%', 'seg1_max',
'seg2_count', 'seg2_mean', 'seg2_std', 'seg2_min', 'seg2_25%', 'seg2_50%', 'seg2_75%', 'seg2_max',
'seg3_count', 'seg3_mean', 'seg3_std', 'seg3_min', 'seg3_25%', 'seg3_50%', 'seg3_75%', 'seg3_max',
'seg4_count', 'seg4_mean', 'seg4_std', 'seg4_min', 'seg4_25%', 'seg4_50%', 'seg4_75%', 'seg4_max'])
More specifically, it's not that the DataFrame creation that is not preserving order, but rather when you create the set the order is lost:
columns_set = {'seg1_count', 'seg1_mean', 'seg1_std', 'seg1_min', 'seg1_25%', 'seg1_50%', 'seg1_75%', 'seg1_max',
'seg2_count', 'seg2_mean', 'seg2_std', 'seg2_min', 'seg2_25%', 'seg2_50%', 'seg2_75%', 'seg2_max',
'seg3_count', 'seg3_mean', 'seg3_std', 'seg3_min', 'seg3_25%', 'seg3_50%', 'seg3_75%', 'seg3_max',
'seg4_count', 'seg4_mean', 'seg4_std', 'seg4_min', 'seg4_25%', 'seg4_50%', 'seg4_75%', 'seg4_max'}
print(columns_set)
{'seg1_50%', 'seg2_count', 'seg4_25%', 'seg3_count', 'seg4_max', 'seg2_25%', 'seg3_min', 'seg4_count', 'seg2_std', 'seg4_75%', 'seg3_std', 'seg1_mean', 'seg2_50%', 'seg3_25%', 'seg1_75%', 'seg3_mean', 'seg1_max', 'seg3_75%', 'seg2_max', 'seg1_min', 'seg3_max', 'seg4_50%', 'seg2_75%', 'seg2_min', 'seg1_count', 'seg4_mean', 'seg3_50%', 'seg1_std', 'seg4_min', 'seg1_25%', 'seg2_mean', 'seg4_std'}
columns_list = ['seg1_count', 'seg1_mean', 'seg1_std', 'seg1_min', 'seg1_25%', 'seg1_50%', 'seg1_75%', 'seg1_max',
'seg2_count', 'seg2_mean', 'seg2_std', 'seg2_min', 'seg2_25%', 'seg2_50%', 'seg2_75%', 'seg2_max',
'seg3_count', 'seg3_mean', 'seg3_std', 'seg3_min', 'seg3_25%', 'seg3_50%', 'seg3_75%', 'seg3_max',
'seg4_count', 'seg4_mean', 'seg4_std', 'seg4_min', 'seg4_25%', 'seg4_50%', 'seg4_75%', 'seg4_max']
print(columns_list)
['seg1_count', 'seg1_mean', 'seg1_std', 'seg1_min', 'seg1_25%', 'seg1_50%', 'seg1_75%', 'seg1_max', 'seg2_count', 'seg2_mean', 'seg2_std', 'seg2_min', 'seg2_25%', 'seg2_50%', 'seg2_75%', 'seg2_max', 'seg3_count', 'seg3_mean', 'seg3_std', 'seg3_min', 'seg3_25%', 'seg3_50%', 'seg3_75%', 'seg3_max', 'seg4_count', 'seg4_mean', 'seg4_std', 'seg4_min', 'seg4_25%', 'seg4_50%', 'seg4_75%', 'seg4_max']