I am struggling to get my dataframe transposed, not simply transposed but I want to limit the number of columns to the number of rows in index slices
, in order to well explain my problem I give you my dataframe here :
df=pd.DataFrame({
'n' : [0,1,2, 0,1,2, 0,1,2],
'col1' : ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'col2' : [9.6,10.4, 11.2, 3.3, 6, 4, 1.94, 15.44, 6.17]
})
It has the following display :
n col1 col2
0 0 A 9.60
1 1 A 10.40
2 2 A 11.20
3 0 B 3.30
4 1 B 6.00
5 2 B 4.00
6 0 C 1.94
7 1 C 15.44
8 2 C 6.17
From that dataframe I want to get the following new_df
:
0 1 2
col1 A A A
col2 9.6 10.4 11.2
col1 B B B
col2 3.3 6.0 4.0
col1 C C C
col2 1.94 15.44 6.17
What I tried so far :
new_df = df.values.reshape(3, 9)
new_w = [x.reshape(3,3).T for x in new_df]
df_1 = pd.DataFrame(new_w[0])
df_1.index = ['n', 'col1', 'col2']
df_2 = pd.DataFrame(new_w[1])
df_2.index = ['n', 'col1', 'col2']
df_3 = pd.DataFrame(new_w[2])
df_3.index = ['n', 'col1', 'col2']
new_df = df_1.append(df_2)
new_df = new_df.append(df_3)
new_df[new_df.index!='n']
The code I tried works but it looks long, I want another shorter solution for that.
Any help from your side will be highly appreciated, thanks.
CodePudding user response:
Identify the unique values in "col1" with factorize
, then melt
to combine the two columns and pivot
:
(df.assign(idx=pd.factorize(df['col1'])[0]).melt(['n', 'idx'])
.pivot(index=['idx', 'variable'], columns='n', values='value')
.droplevel('idx').rename_axis(index=None, columns=None) # optional
)
Output:
0 1 2
col1 A A A
col2 9.6 10.4 11.2
col1 B B B
col2 3.3 6.0 4.0
col1 C C C
col2 1.94 15.44 6.17
CodePudding user response:
In the following method I extract 3 dataframes so that I can concatenate them later. I have to do a bit of manipulation to get it into the correct format:
- Select every 3 rows
- Transpose these 3 rows
- Get the column names from the first row
- Remove the first row
- append to a list
Once I have the 3 dataframes in a list, they can be concatenated using pd.concat
Code:
t_df = []
for i in range (int(len(df)/3)):
temp = df.iloc[i*3:(i 1)*3].T
temp.columns = temp.iloc[0]
temp = temp[1:]
t_df.append(temp)
new_df = pd.concat(t_df)
print(new_df)
Output:
n 0 1 2
col1 A A A
col2 9.6 10.4 11.2
col1 B B B
col2 3.3 6.0 4.0
col1 C C C
col2 1.94 15.44 6.17