I want to flatten a dataframe in pandas. This is basically by duplicating the column_names with prefix/suffix of occurence/order of the column and the number of extra columns created should be based on the number of rows.
For example:
`
df = pd.DataFrame({
'id': [1, 2, 3, 4],
'A': [10, 20, 30, 40],
'B': [50, 60, 70, 80],
'C': [90, 100, 110, 120]
})
print(df)
# id A B C
# 0 1 10 50 90
# 1 2 20 60 100
# 2 3 30 70 110
# 3 4 40 80 120
#I want something like the following.
print(result_df)
# id1 A1 B1 C1 id2 A2 B2 C2 id3 A3 B3 C3 id4 A4 B4 C4
# 0 1 10 50 90 2 20 60 100 3 30 70 110 4 40 80 120
`
CodePudding user response:
df.unstack().to_frame().T
id A B C 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 4 10 20 30 40 50 60 70 80 90 100 110 120
The .unstack()
method returns a series, then .to_frame()
convers it to a dataframe with one column, and finally the .T
transposes this colomn to a row.
CodePudding user response:
Try this:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'id': [1, 2, 3, 4],
'A': [10, 20, 30, 40],
'B': [50, 60, 70, 80],
'C': [90, 100, 110, 120]
})
df_out = df.unstack().to_frame().T.sort_index(level=0)
df_out.columns = [f'{i}{j 1}' for i, j in df_out.columns]
print(df_out)
Output:
id1 id2 id3 id4 A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4
0 1 2 3 4 10 20 30 40 50 60 70 80 90 100 110 120