I have a huge dataset with more than 900 columns. Part of the columns are languages of the world. I want to sort this subset of columns from my dataset in alphabetical order. My solution was to slice the dataset, assign this subset to another variable and sort the columns. Now, I am trying to replace the unsorted subset by the sorted subset, but the code below did not do the job.
lang_columns1=survey.iloc[:,19:104] #assigning subset of columns to a variable
lang_columns_sorted1 =lang_columns1.reindex(sorted(lang_columns1.columns), axis=1)
#sorting columns in alphabetical order by name
survey.iloc[:,19:104] = lang_columns_sorted1 #replacing the unsorted columns by the sorted columns
Any suggestion on how to solve this problem?
CodePudding user response:
Use DataFrame.reindex
by joined all columns:
s = survey.columns.to_series()
survey.columns = [s, s.groupby(s).cumcount()]
cols = survey.columns[:19].tolist()
sorted(survey.columns[19:104], key=lambda x: (x[0], x[1]))
survey.columns[104:].tolist()
survey = survey.reindex(cols, axis=1).droplevel(1, axis=1)