I frequently get new datasets with new variables from time to and want to make them symmetrical for comparison purposes.
Data is multiple indexed, where each profile can have an impact from [-2, 2] with values varying over the years.
A) CAN'T --> Don't understand how to add missing profile index 'RPG'
B) CAN --> I am able to add all the values from [-2,2].
Therefore, I just need to add 1 line with the index 'RPG' (even better if it could be symmetrical from the start, but it's not a problem)
import numpy as np
import pandas as pd
list_profile = ['gun','bat','RPG']
df = pd.DataFrame({'profile': ['gun','gun','gun','bat','bat','bat'],
'impact': [-1, 0, 1, -1, 0, 1],
'2020': [-10, 0, 15, -3, 0, 4],
'2021': [-20, 0, 30, -6, 0, 8],})
print(df)
#---------------------------------------------------------------------
# A) UNELEGANT SOLUTION I WAS THINKING ABOUT THAT DOESN'T WORK :
#---------------------------------------------------------------------
df = df.set_index('profile')
for profile in list_profile
try:
df.loc[profile]
except ValueError:
# ADD OBSERVATION FOR MISSING INDEX RPG
#---------------------------------------------------------------------
# B) Code for expanding 'impact' from [-1,1] to [-2,2] :
#----------------------------------------------------------
mux = pd.MultiIndex.from_product([df['profile'].unique(),
np.arange(-2,2 1)],
names=['profile','impact'])
df = df.set_index(['profile', 'impact']).reindex(mux).reset_index()
df = df.fillna(0)
df
``
CodePudding user response:
IIUC, do you want?
mux = pd.MultiIndex.from_product([list_profile, np.arange(-2,2 1)], names=['profile', 'impact'])
df.set_index(['profile', 'impact']).reindex(mux, fill_value=0).reset_index()
Output:
profile impact 2020 2021
0 gun -2 0 0
1 gun -1 -10 -20
2 gun 0 0 0
3 gun 1 15 30
4 gun 2 0 0
5 bat -2 0 0
6 bat -1 -3 -6
7 bat 0 0 0
8 bat 1 4 8
9 bat 2 0 0
10 RPG -2 0 0
11 RPG -1 0 0
12 RPG 0 0 0
13 RPG 1 0 0
14 RPG 2 0 0