I am attempting to create a multi index dataframe which contains every possible index even ones where it does not currently contain values. I wish to set these non-existent values to 0. To achieve this, I used the following:
index_levels = ['Channel', 'Duration', 'Designation', 'Manufacturing Class']
grouped_df = df.groupby(by = index_levels)[['Total Purchases', 'Sales', 'Cost']].agg('sum')
grouped_df = grouped_df.reindex(pd.MultiIndex.from_product(grouped_df.index.levels), fill_value = 0)
The expected result:
___________________________________________________________________________________________
|Chan. | Duration | Designation| Manufact. |Total Purchases| Sales | Cost |
|______|____________|____________|______________|_______________|_____________|_____________|
| | Month | Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|Retail| |____________|______________|_______________|_____________|_____________|
| | |Not Special | Brand | 756 | 15654.07 | 9498.23 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 7896 | 98745.23 | 78953.56 |
| |____________|____________|______________|_______________|_____________|_____________|
| | Season | Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
| | |____________|______________|_______________|_____________|_____________|
| | |Not Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|______|____________|____________|______________|_______________|_____________|_____________|
This result is produced when at least one of the index levels contains a value. However, if the index level does not contain any value, then the following result is produced below.
___________________________________________________________________________________________
|Chan. | Duration | Designation| Manufact. |Total Purchases| Sales | Cost |
|______|____________|____________|______________|_______________|_____________|_____________|
| | Month | Not Special| Brand | 756 | 15654.07 | 9498.23 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 7896 | 98745.23 | 78953.56 |
|Retail|____________|____________|______________|_______________|_____________|_____________|
| | Season |Not Special | Brand | 0 | 0.00 | 0.00 |
| | | |______________|_______________|_____________|_____________|
| | | | Generic | 0 | 0.00 | 0.00 |
|______|____________|____________|______________|_______________|_____________|_____________|
For some reason, the values continue to be autotruncated. How can I fix indices so that the desired result is always produced and I can always reliably use these indices for calculations, even when said indices have no values in them?
CodePudding user response:
What you can do is construct the desired fixed index beforehand. For instance, based on a dictionary where the keys are the columns labels used as group index, and the values are all the possible outcomes.
index_levels = {
'Channel': ['Retails'],
'Duration': ['Month', 'Season'],
'Designation': ['Special', 'Not Special'],
'Manufacturing Class': ['Brand', 'Generic']
}
fixed_index = pd.MultiIndex.from_product(index_levels.values(), names=index_levels.keys())
Then you can do
grouped_df = df.groupby(by=index_levels.keys())[['Total Purchases', 'Sales', 'Cost']].agg('sum')
grouped_df = grouped_df.reindex(fixed_index, fill_value=0)