Add a Multi-level Index on existing dataframe-CodePudding

I am woring with pandas and I have an existing dataframe with 6 columns, with one level of index that looks like this:

No	a	b	c	d	e	f
1	34	43	29	78	29	68
2	29	28	57	39	10	37

and I want to add a second level of index so that it will look like this:

lvl1	1	1	2	2	3	3
lvl2	a	b	c	d	e	f
1	34	43	29	78	29	68
2	29	28	57	39	10	37

please how do I go about this using MultiIndex?

CodePudding user response：

Not sure how/where you want to pick the index values from, so let me share a vanilla and easy to generalize way of having a multi-indexed dataframe:

df = pd.DataFrame(data=np.arange(50).reshape(-1,10))
df.index = pd.MultiIndex.from_tuples((i,i) for i in range(len(df)))
df
# =  -------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 
#   |        |   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |   8 |   9 |
#   |-------- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----|
#   | (0, 0) |   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |   8 |   9 |
#   | (1, 1) |  10 |  11 |  12 |  13 |  14 |  15 |  16 |  17 |  18 |  19 |
#   | (2, 2) |  20 |  21 |  22 |  23 |  24 |  25 |  26 |  27 |  28 |  29 |
#   | (3, 3) |  30 |  31 |  32 |  33 |  34 |  35 |  36 |  37 |  38 |  39 |
#   | (4, 4) |  40 |  41 |  42 |  43 |  44 |  45 |  46 |  47 |  48 |  49 |
#    -------- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----

Based on your comment you could try:

# creating dummy data
df = pd.DataFrame(data=np.arange(60).reshape(-1, 6))
# creating Multi Index column, from a tuple of (level_0_value, level_1_value) entries
new_columns = pd.MultiIndex.from_tuples((i//2   1,column_name) for i, column_name in enumerate(df))
# replacing dataframe columns with the newly created ones
df.columns = new_columns

CodePudding user response：

create the tuple multi index for two index levels where level 0 is 1 and 2 and level 1 is a,b,c,d,e,f. Next extract A as a list of No 1 values and B as a list of No 2 values. Create the multi index and then create dataframe df2 using the lst_1 and lst_2 values for A and B and set the index to the multi-level index.

data="""No  a   b   c   d   e   f
1   34  43  29  78  29  68
2   29  28  57  39  10  37
"""
df = pd.read_csv(StringIO(data), sep="\s ").reset_index()
df.reset_index(inplace=True)
print(df.columns)
lst=[(1,'a'),(1,'b'),
     (2,'c'),(2,'d'),
     (3,'e'),(3,'f')
    ]

index=pd.MultiIndex.from_tuples(lst,names=['ID1','ID2'])

exclude=["No","level_0","index"]
columns=[x for x in df.columns if x not in exclude]

lst_1=np.array(df[df['No']==1][columns].unstack())
lst_2=np.array(df[df['No']==2][columns].unstack())

print(lst_1)
print(lst_2)
df2=pd.DataFrame({'A':lst_1,'B':lst_2},index=index)

print(df2)

output:

          A   B
ID1 ID2        
1   a    34  29
    b    43  28
2   c    29  57
    d    78  39
3   e    29  10
    f    68  37