i have one question: I have a big dataframe with over 1000 columns.
For example as following the heards of the columns: 2019 Material Cost, 2019 Labor Cost, 2019 Overhead Cost, 2020 Material Cost, 2020 Labor Cost, 2020 Overhead Cost, ...2035
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2019 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2019 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2020 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2020 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2021 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2021 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
I want to sort all headers into the following:
2019 Material Cost, 2020 Material cost, 2021 Material Cost,...,2019 Labor Cost, 2020 Labor Cost, 2021 Labor Cost, ... ,2019 Overhead Cost, 2020 Overhead Cost,2021 Overhead Cost
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Material cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Material cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Overhead cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Overhead cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Labor cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Labor cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
So i want to have one cost category and sort the years of the category ascending in a following order, then the next category.
Any help here? Thanks in advance
CodePudding user response:
Create two lists, one with the costs and one with the years. Using these lists you can create another list containing all column names (in order).
costs = list(df.columns.str[5:].unique())
years = list(range(2019, 2036))
columns = [str(year) ' ' cost for year in years for cost in costs]
df = df.reindex(columns=columns)
For example:
df = pd.DataFrame(np.random.random((10, 10)), columns = ['1 a', '2 a', '3 a', '4 a', '5 a', '1 b', '2 b', '3 b', '4 b', '5 b'])
costs = ['a', 'b']
years = [1, 2, 3, 4, 5]
columns = [str(year) ' ' cost for year in years for cost in costs]
df.reindex(columns=columns).columns
Returns
Index(['1 a', '1 b', '2 a', '2 b', '3 a', '3 b', '4 a', '4 b', '5 a', '5 b'], dtype='object')
CodePudding user response:
@Chris given input:
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2019 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2019 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2020 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2020 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2021 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2021 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
and i want to have this as output (sorted after category and ascending for the years):
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Material cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Material cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Overhead cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Overhead cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Labor cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Labor cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})