I have a df
:
df_test = pd.DataFrame.from_dict({('group', ''): {0: 'A',
1: 'A',
2: 'A',
3: 'A',
4: 'A',
5: 'A',
6: 'A',
7: 'A',
8: 'A',
9: 'B',
10: 'B',
11: 'B',
12: 'B',
13: 'B',
14: 'B',
15: 'B',
16: 'B',
17: 'B',
18: 'all',
19: 'all'},
('category', ''): {0: 'Amazon',
1: 'Apple',
2: 'Facebook',
3: 'Google',
4: 'Netflix',
5: 'Tesla',
6: 'Total',
7: 'Uber',
8: 'total',
9: 'Amazon',
10: 'Apple',
11: 'Facebook',
12: 'Google',
13: 'Netflix',
14: 'Tesla',
15: 'Total',
16: 'Uber',
17: 'total',
18: 'Total',
19: 'total'},
(pd.Timestamp('2020-06-29'), 'last_sales'): {0: 195.0,
1: 61.0,
2: 106.0,
3: 61.0,
4: 37.0,
5: 13.0,
6: 954.0,
7: 4.0,
8: 477.0,
9: 50.0,
10: 50.0,
11: 75.0,
12: 43.0,
13: 17.0,
14: 14.0,
15: 504.0,
16: 3.0,
17: 252.0,
18: 2916.0,
19: 2916.0},
(pd.Timestamp('2020-06-29'), 'sales'): {0: 1268.85,
1: 18274.385000000002,
2: 19722.65,
3: 55547.255,
4: 15323.800000000001,
5: 1688.6749999999997,
6: 227463.23,
7: 1906.0,
8: 113731.615,
9: 3219.6499999999996,
10: 15852.060000000001,
11: 17743.7,
12: 37795.15,
13: 5918.5,
14: 1708.75,
15: 166349.64,
16: 937.01,
17: 83174.82,
18: 787625.7400000001,
19: 787625.7400000001},
(pd.Timestamp('2020-06-29'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2020-07-06'), 'last_sales'): {0: 26.0,
1: 39.0,
2: 79.0,
3: 49.0,
4: 10.0,
5: 10.0,
6: 436.0,
7: 5.0,
8: 218.0,
9: 89.0,
10: 34.0,
11: 133.0,
12: 66.0,
13: 21.0,
14: 20.0,
15: 732.0,
16: 3.0,
17: 366.0,
18: 2336.0,
19: 2336.0},
(pd.Timestamp('2020-07-06'), 'sales'): {0: 3978.15,
1: 12138.96,
2: 19084.175,
3: 40033.46000000001,
4: 4280.15,
5: 1495.1,
6: 165548.29,
7: 1764.15,
8: 82774.145,
9: 8314.92,
10: 12776.649999999996,
11: 28048.075,
12: 55104.21000000002,
13: 6962.844999999999,
14: 3053.2000000000003,
15: 231049.11000000002,
16: 1264.655,
17: 115524.55500000001,
18: 793194.8000000002,
19: 793194.8000000002},
(pd.Timestamp('2020-07-06'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-06-28'), 'last_sales'): {0: 96.0,
1: 56.0,
2: 106.0,
3: 44.0,
4: 34.0,
5: 13.0,
6: 716.0,
7: 9.0,
8: 358.0,
9: 101.0,
10: 22.0,
11: 120.0,
12: 40.0,
13: 13.0,
14: 8.0,
15: 610.0,
16: 1.0,
17: 305.0,
18: 2652.0,
19: 2652.0},
(pd.Timestamp('2021-06-28'), 'sales'): {0: 5194.95,
1: 19102.219999999994,
2: 22796.420000000002,
3: 30853.115,
4: 11461.25,
5: 992.6,
6: 188143.41,
7: 3671.15,
8: 94071.705,
9: 6022.299999999998,
10: 7373.6,
11: 33514.0,
12: 35943.45,
13: 4749.000000000001,
14: 902.01,
15: 177707.32,
16: 349.3,
17: 88853.66,
18: 731701.46,
19: 731701.46},
(pd.Timestamp('2021-06-28'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-07-07'), 'last_sales'): {0: 45.0,
1: 47.0,
2: 87.0,
3: 45.0,
4: 13.0,
5: 8.0,
6: 494.0,
7: 2.0,
8: 247.0,
9: 81.0,
10: 36.0,
11: 143.0,
12: 56.0,
13: 9.0,
14: 9.0,
15: 670.0,
16: 1.0,
17: 335.0,
18: 2328.0,
19: 2328.0},
(pd.Timestamp('2021-07-07'), 'sales'): {0: 7556.414999999998,
1: 14985.05,
2: 16790.899999999998,
3: 36202.729999999996,
4: 4024.97,
5: 1034.45,
6: 163960.32999999996,
7: 1385.65,
8: 81980.16499999998,
9: 5600.544999999999,
10: 11209.92,
11: 32832.61,
12: 42137.44500000001,
13: 3885.1499999999996,
14: 1191.5,
15: 194912.34000000003,
16: 599.0,
17: 97456.17000000001,
18: 717745.3400000001,
19: 717745.3400000001},
(pd.Timestamp('2021-07-07'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0}}).set_index(['group','category'])
Which follows this structure:
2020-06-29 2020-07-05
group category last_sales sales difference
A Amazon 195 1268 0
Apple 61 18247 0
Facebook 106 19722 0
...
B Amazon 50 3219 0
Apple 50 15852 0
Facebook 75 17743 0
...
I am trying to move sales, last_sales, difference
next to category. So at the moment those are columns in each week, I am trying to make them some kind of index where the df
would look like this:
2020-06-29 2020-07-05
group category
A Amazon last_sales 195 ...
sales 1268 ...
difference 0 ...
Apple last_sales 61 ...
sales 18247 ...
difference 0 ...
Facebook last_sales 106 ...
sales 19722 ...
difference 0 ...
...
B Amazon last_sales 50 ...
sales 3219 ...
difference 0 ...
Apple last_sales 50 ...
sales 15852 ...
difference 0 ...
Facebook last_sales 75 ...
sales 17743 ...
difference 0 ...
...
I've been trying to achieve this using .unstack(), .reset_index()
but with no success, I am not sure how I should proceed with this problem.
The problem I am unable to solves comes from the fact that :
df_test.columns
Returns:
MultiIndex([(2020-06-29 00:00:00, 'last_sales'),
(2020-06-29 00:00:00, 'sales'),
(2020-06-29 00:00:00, 'difference'),
(2020-07-06 00:00:00, 'last_sales'),
(2020-07-06 00:00:00, 'sales'),
(2020-07-06 00:00:00, 'difference'),
(2021-06-28 00:00:00, 'last_sales'),
(2021-06-28 00:00:00, 'sales'),
(2021-06-28 00:00:00, 'difference'),
(2021-07-07 00:00:00, 'last_sales'),
(2021-07-07 00:00:00, 'sales'),
(2021-07-07 00:00:00, 'difference')],
)
So the one part, the date should stay as a column and the other part, the sales, last_sales and difference
should become one new column ( or index, I am not sure, but I hope my desired output explains it).
CodePudding user response:
If original ordering is not important, use DataFrame.stack
:
df = df_test.stack()
print (df.head())
2020-06-29 00:00:00 2020-07-06 00:00:00 \
group category
A Amazon difference 0.00 0.00
last_sales 195.00 26.00
sales 1268.85 3978.15
Apple difference 0.00 0.00
last_sales 61.00 39.00
2021-06-28 00:00:00 2021-07-07 00:00:00
group category
A Amazon difference 0.00 0.000
last_sales 96.00 45.000
sales 5194.95 7556.415
Apple difference 0.00 0.000
last_sales 56.00 47.000
For original ordering use:
#because sample data
#df_test.columns = df_test.columns.remove_unused_levels()
cats = pd.CategoricalIndex(df_test.columns.levels[1],
ordered=True,
categories=['last_sales','sales','difference'])
df_test.columns = df_test.columns.set_levels(cats, level=1)
df = df_test.stack()
print (df.head())
2020-06-29 00:00:00 2020-07-06 00:00:00 \
group category
A Amazon last_sales 195.000 26.00
sales 1268.850 3978.15
difference 0.000 0.00
Apple last_sales 61.000 39.00
sales 18274.385 12138.96
2021-06-28 00:00:00 2021-07-07 00:00:00
group category
A Amazon last_sales 96.00 45.000
sales 5194.95 7556.415
difference 0.00 0.000
Apple last_sales 56.00 47.000
sales 19102.22 14985.050