Adding a multindex to the columns of a pandas df based on several dictionaries-CodePudding

Assuming I have a DF like:

person_names = ['mike','manu','ana','analia','anomalia','fer']
df = pd.DataFrame(np.random.randn(5, 6), columns = person_names)
df

I also have two dictionaries, for easy purposes assuming only two:

# giving a couple of dictionaries like:
d = {'mike':{'city':'paris', 'department':2},
   'manu':{'city':'london', 'department':1},
   'ana':{'city':'barcelona', 'department':5}}
d2 = {'analia':{'functional':True, 'speed':'high'},
      'anomalia':{'functional':True, 'speed':'medium'},
      'fer':{'functional':False, 'speed':'low'}}

The result I would to achieve is a df having a multindex as shown in the excel screenshot here:

The dictionaries contain values for SOME of the column names.

Not only I need to create the multiindex based on dictionaries but take into account that the keys of the dictionaries are different for both dictionaries, I would also like to keep the original names of the columns as first level of the multiindex.

Any suggestion?

CodePudding user response：

# Easier to use joins when transposed.
dft = df.T
dd2 = pd.DataFrame(d2).T
dd = pd.DataFrame(d).T

# We join indexes together, and then add primary df
dd.join(dd2, how="outer").join(dft).T

Results in

Out[44]:
                  ana    analia  anomalia       fer      manu      mike
city        barcelona       NaN       NaN       NaN    london     paris
department          5       NaN       NaN       NaN         1         2
functional        NaN      True      True     False       NaN       NaN
speed             NaN      high    medium       low       NaN       NaN
0           -0.132317  2.513232  0.481609 -0.948312  1.425882  1.969711
1           -0.893227 -0.208046 -0.190703 -0.200429 -0.960934 -0.568568
2            -0.39221  1.442398   0.77165   0.73143 -1.832893 -0.667037
3           -0.245534 -0.037821  1.194735  0.765611  1.787658   0.65847
4           -0.943287 -0.151373  0.572972  0.079812   1.38536 -1.854453

You can use .set_index prior to the last transposition (.T) To properly set multi index, if required.

CodePudding user response：

Here's a way to do what your question asks for an arbitrary number of dictionaries, with the levels of the MultiIndex in the sequence shown in the OP:

dcts = (d, d2)
out = df.T
all_keys = []
for dct in reversed(dcts):
    keys = list(reversed(dct[next(iter(dct))].keys()))
    all_keys  = keys
    out[keys] = [[dct[name][key] if name in dct else None for key in keys] for name in out.index]
out = out.reset_index().rename(columns={'index':'person name'}).set_index(all_keys).T

Output:

speed             NaN                          high    medium       low
functional        NaN                          True      True     False
department        2.0       1.0       5.0       NaN       NaN       NaN
city            paris    london barcelona       NaN       NaN       NaN
person name      mike      manu       ana    analia  anomalia       fer
0             -0.8392 -0.531491  0.191687  1.147941  0.292531    0.4075
1            0.644307  0.550632 -0.241302   2.20928  -0.91035 -0.514395
2            0.305059  0.123411  0.846471 -0.151596 -0.410483 -1.661958
3           -0.763649  0.109589  2.215242  0.530734  0.261605  0.472373
4            0.153886 -0.782931  1.179735 -0.339259  0.314743 -0.088172

Note that there because of MultiIndex grouping, some values of the levels speed and functional are shown just once and not repeated for columns to the right with the same level value.