Home > other >  Flatten the dataframe in pandas
Flatten the dataframe in pandas

Time:12-29

I want to flatten a dataframe in pandas. This is basically by duplicating the column_names with prefix/suffix of occurence/order of the column and the number of extra columns created should be based on the number of rows.

For example:

`

df = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'A': [10, 20, 30, 40],
    'B': [50, 60, 70, 80],
    'C': [90, 100, 110, 120]
})

print(df)

#    id   A   B    C
# 0   1  10  50   90
# 1   2  20  60  100
# 2   3  30  70  110
# 3   4  40  80  120



#I want something like the following.

print(result_df)

#    id1  A1   B1   C1  id2  A2   B2   C2  id3  A3   B3   C3  id4  A4   B4   C4
# 0    1  10   50   90    2  20   60  100    3  30   70  110    4  40   80  120



`

CodePudding user response:

df.unstack().to_frame().T
  id            A               B               C
   0  1  2  3   0   1   2   3   0   1   2   3   0    1    2    3
0  1  2  3  4  10  20  30  40  50  60  70  80  90  100  110  120

The .unstack() method returns a series, then .to_frame() convers it to a dataframe with one column, and finally the .T transposes this colomn to a row.

CodePudding user response:

Try this:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'A': [10, 20, 30, 40],
    'B': [50, 60, 70, 80],
    'C': [90, 100, 110, 120]
})

df_out = df.unstack().to_frame().T.sort_index(level=0)
df_out.columns = [f'{i}{j 1}' for i, j in df_out.columns]
print(df_out)

Output:

   id1  id2  id3  id4  A1  A2  A3  A4  B1  B2  B3  B4  C1   C2   C3   C4
0    1    2    3    4  10  20  30  40  50  60  70  80  90  100  110  120
  • Related