Home > Enterprise >  flatten dataframe columns to multi-indexed
flatten dataframe columns to multi-indexed

Time:01-06

I want to go from a previously flattened dataframe with single-level columns back to a multi-indexed dataframe.

Here is an example:

import pandas as pd

# Create a sample dataframe with multi-indexed columns
df = pd.DataFrame({('A', 'a'): [1, 2, 3], ('A', 'b'): [4, 5, 6], ('B', 'a'): [7, 8, 9], ('B', 'b'): [10, 11, 12]})

print(df)

The multi-indexed dataframe:

   A     B    
   a  b  a   b
0  1  4  7  10
1  2  5  8  11
2  3  6  9  12

Than the flattening:

# Flatten the columns using the to_flat_index() method
df.columns = df.columns.to_flat_index()

print(df)

flattened dataframe with single-level columns:

   (A, a)  (A, b)  (B, a)  (B, b)
0       1       4       7      10
1       2       5       8      11
2       3       6       9      12

How to go from a flattened dataframe with single-level columns back to a multi-indexed dataframe?

CodePudding user response:

First Case

Taking the flattened dataframe from the example above:

# Create a new MultiIndex from the columns tuples names
new_columns = pd.MultiIndex.from_tuples(df.columns)

# Assign the new MultiIndex to the columns
df.columns = new_columns

print(df)

Results in:

   A     B    
   a  b  a   b
0  1  4  7  10
1  2  5  8  11
2  3  6  9  12

Because the df.columns = df.columns.to_flat_index() method returns a new Index object with a single level, which is the result of flattening the multi-level column names into tuples.

Therefore, to create a new multi-indexed dataframe from a flattened dataframe, you need to extract the tuples of the original multi-indexed column names and pass them to the pd.MultiIndex.from_tuples method.

Second Case

You might encounter differently flattened dataframes like in this example:

import pandas as pd

# Create a sample dataframe with multi-indexed columns
df = pd.DataFrame({('A', 'a'): [1, 2, 3], ('A', 'b'): [4, 5, 6], ('B', 'a'): [7, 8, 9], ('B', 'b'): [10, 11, 12]})

# Flatten the columns
df.columns = ['_'.join(col) for col in df.columns]

print(df)

Which results in:

   A_a  A_b  B_a  B_b
0    1    4    7   10
1    2    5    8   11
2    3    6    9   12

In this case you can use the following code to get a multi-indexed dataframe agian:

# Create a new MultiIndex from a list of tuples
new_columns = pd.MultiIndex.from_tuples([tuple(col.split('_')) for col in df.columns])

# Assign the new MultiIndex to the columns
df.columns = new_columns

print(df)

which results in:

   A     B    
   a  b  a   b
0  1  4  7  10
1  2  5  8  11
2  3  6  9  12
  • Related