I would like to change a column index of a dataframe into a multiindex as follows.
assuming the data:
arrays = [np.array(["2021_01", "2021_01", "2021_02", "2021_02", "2021_03","2021_03","2021_04","2021_04"]),
np.array(["ALLO", "EPQ", "ALLO", "EPQ","ALLO", "EPQ","ALLO", "EPQ"])]
tuples_i = list(zip(*arrays))
index_rows = pd.MultiIndex.from_tuples(tuples_i, names=["city", "number"])
person_names = ['mike','manu','ana','analia','anomalia','fer']
df = pd.DataFrame(np.random.randn(8, 6), index=index_rows, columns = person_names)
df
given a random dictionary having as keys the column names (which are unique) and some kind of tupple of values:
{k:(k[0:2], k[0:2] k[0:2]) for k in df.columns}
{'mike': ('mi', 'mimi'),
'manu': ('ma', 'mama'),
'ana': ('an', 'anan'),
'analia': ('an', 'anan'),
'anomalia': ('an', 'anan'),
'fer': ('fe', 'fefe')}
How would I create a multiindex in the columns with three levels, the current one and the values of the corresponding tuple of the dictionary?
How should I go about values not present in the dictionary (having a default value ('missing','missing')?
CodePudding user response:
Assuming df
the DataFrame (for simplicity), and d
the dictionary:
df.columns = pd.MultiIndex.from_arrays([df.columns, *zip(*df.columns.map(d))])
If the dictionary keys are already aligned with the index, this simplifies to:
df.columns = pd.MultiIndex.from_arrays([df.columns, *zip(*d.values())])
Output:
mike manu ana analia anomalia fer
mi ma an an an fe
mimi mama anan anan anan fefe
city number
2021_01 ALLO 2.024601 0.217844 2.049117 0.070271 1.089704 1.640760
EPQ -0.532718 0.672654 -0.629328 0.079538 0.490475 -0.750072
2021_02 ALLO -0.035613 1.334336 -1.202566 0.913976 0.043350 0.265392
EPQ 0.981459 -0.077180 1.737501 -0.986365 0.966194 0.002310
2021_03 ALLO 0.053481 -0.243616 -0.993706 0.549090 1.207434 -0.340995
EPQ 0.325375 -1.022405 -0.758095 0.315423 0.371075 2.300323
2021_04 ALLO 0.988653 0.368673 0.866023 -0.114965 1.951587 0.371901
EPQ 0.665541 0.641732 -0.737723 -0.748781 1.056270 -0.546598