I want to go from a previously flattened dataframe with single-level columns back to a multi-indexed dataframe.
Here is an example:
import pandas as pd
# Create a sample dataframe with multi-indexed columns
df = pd.DataFrame({('A', 'a'): [1, 2, 3], ('A', 'b'): [4, 5, 6], ('B', 'a'): [7, 8, 9], ('B', 'b'): [10, 11, 12]})
print(df)
The multi-indexed dataframe:
A B
a b a b
0 1 4 7 10
1 2 5 8 11
2 3 6 9 12
Than the flattening:
# Flatten the columns using the to_flat_index() method
df.columns = df.columns.to_flat_index()
print(df)
flattened dataframe with single-level columns:
(A, a) (A, b) (B, a) (B, b)
0 1 4 7 10
1 2 5 8 11
2 3 6 9 12
How to go from a flattened dataframe with single-level columns back to a multi-indexed dataframe?
CodePudding user response:
First Case
Taking the flattened dataframe from the example above:
# Create a new MultiIndex from the columns tuples names
new_columns = pd.MultiIndex.from_tuples(df.columns)
# Assign the new MultiIndex to the columns
df.columns = new_columns
print(df)
Results in:
A B
a b a b
0 1 4 7 10
1 2 5 8 11
2 3 6 9 12
Because the df.columns = df.columns.to_flat_index()
method returns a new Index object with a single level, which is the result of flattening the multi-level column names into tuples.
Therefore, to create a new multi-indexed dataframe from a flattened dataframe, you need to extract the tuples of the original multi-indexed column names and pass them to the pd.MultiIndex.from_tuples
method.
Second Case
You might encounter differently flattened dataframes like in this example:
import pandas as pd
# Create a sample dataframe with multi-indexed columns
df = pd.DataFrame({('A', 'a'): [1, 2, 3], ('A', 'b'): [4, 5, 6], ('B', 'a'): [7, 8, 9], ('B', 'b'): [10, 11, 12]})
# Flatten the columns
df.columns = ['_'.join(col) for col in df.columns]
print(df)
Which results in:
A_a A_b B_a B_b
0 1 4 7 10
1 2 5 8 11
2 3 6 9 12
In this case you can use the following code to get a multi-indexed dataframe agian:
# Create a new MultiIndex from a list of tuples
new_columns = pd.MultiIndex.from_tuples([tuple(col.split('_')) for col in df.columns])
# Assign the new MultiIndex to the columns
df.columns = new_columns
print(df)
which results in:
A B
a b a b
0 1 4 7 10
1 2 5 8 11
2 3 6 9 12