I am woring with pandas and I have an existing dataframe with 6 columns, with one level of index that looks like this:
No | a | b | c | d | e | f |
---|---|---|---|---|---|---|
1 | 34 | 43 | 29 | 78 | 29 | 68 |
2 | 29 | 28 | 57 | 39 | 10 | 37 |
and I want to add a second level of index so that it will look like this:
lvl1 | 1 | 1 | 2 | 2 | 3 | 3 |
---|---|---|---|---|---|---|
lvl2 | a | b | c | d | e | f |
1 | 34 | 43 | 29 | 78 | 29 | 68 |
2 | 29 | 28 | 57 | 39 | 10 | 37 |
please how do I go about this using MultiIndex?
CodePudding user response:
Not sure how/where you want to pick the index values from, so let me share a vanilla and easy to generalize way of having a multi-indexed dataframe:
df = pd.DataFrame(data=np.arange(50).reshape(-1,10))
df.index = pd.MultiIndex.from_tuples((i,i) for i in range(len(df)))
df
# = -------- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
# | | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
# |-------- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----|
# | (0, 0) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
# | (1, 1) | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
# | (2, 2) | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 |
# | (3, 3) | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
# | (4, 4) | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 |
# -------- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
Based on your comment you could try:
# creating dummy data
df = pd.DataFrame(data=np.arange(60).reshape(-1, 6))
# creating Multi Index column, from a tuple of (level_0_value, level_1_value) entries
new_columns = pd.MultiIndex.from_tuples((i//2 1,column_name) for i, column_name in enumerate(df))
# replacing dataframe columns with the newly created ones
df.columns = new_columns
CodePudding user response:
create the tuple multi index for two index levels where level 0 is 1 and 2 and level 1 is a,b,c,d,e,f. Next extract A as a list of No 1 values and B as a list of No 2 values. Create the multi index and then create dataframe df2 using the lst_1 and lst_2 values for A and B and set the index to the multi-level index.
data="""No a b c d e f
1 34 43 29 78 29 68
2 29 28 57 39 10 37
"""
df = pd.read_csv(StringIO(data), sep="\s ").reset_index()
df.reset_index(inplace=True)
print(df.columns)
lst=[(1,'a'),(1,'b'),
(2,'c'),(2,'d'),
(3,'e'),(3,'f')
]
index=pd.MultiIndex.from_tuples(lst,names=['ID1','ID2'])
exclude=["No","level_0","index"]
columns=[x for x in df.columns if x not in exclude]
lst_1=np.array(df[df['No']==1][columns].unstack())
lst_2=np.array(df[df['No']==2][columns].unstack())
print(lst_1)
print(lst_2)
df2=pd.DataFrame({'A':lst_1,'B':lst_2},index=index)
print(df2)
output:
A B
ID1 ID2
1 a 34 29
b 43 28
2 c 29 57
d 78 39
3 e 29 10
f 68 37