How to transpose only a specific amount and add it to the existing dataframe-CodePudding

I searched the internet to find a solution for my problem, but i could not find it. I have the folowing dataframe

  pos1 pos2 pos3
0    A    A    A
1    B    B    B
2    C    C    C
3    D    D    D
4    E    E    E
5    F    F    F
6    G    G    G
7    H    H    H
8    I    I    I

and i want to add to the existing dataframe the folowing dataframe:

  pos1 pos2 pos3
0    A    B    C
1    A    B    C
2    A    B    C
3    D    E    F
4    D    E    F
5    D    E    F
6    G    H    I
7    G    H    I
8    G    H    I

So that i get the following dataframe:

   pos1 pos2 pos3
0     A    A    A
1     B    B    B
2     C    C    C
3     D    D    D
4     E    E    E
5     F    F    F
6     G    G    G
7     H    H    H
8     I    I    I
9     A    B    C
10    A    B    C
11    A    B    C
12    D    E    F
13    D    E    F
14    D    E    F
15    G    H    I
16    G    H    I
17    G    H    I

I know that the number of row are always a multiple of the number of columns. That means if i have 4 columns than the rows should be either 4, 8, 12, 16, etc. Im my example the columns are 3 and the rows are 9

What i then want to do is transpose the rows into columns but only for that number of columns. So i want the first 3 rows to be transposed with the columns, then the next 3 rows and so forth.

I have now the following code:

import pandas as pd
import io

s = """pos1 pos2 pos3
A A A
B B B
C C C
D D D
E E E 
F F F
G G G
H H H
I I I
"""
df = pd.read_csv(io.StringIO(s), delim_whitespace=True)

final_df = df.copy()
index_values = final_df.index.values

value = 0
while value <  len(df.index):
    sub_df = df[value:value 3]
    sub_df.columns = index_values[value: value   3]
    sub_df = sub_df.T
    sub_df.columns = df.columns
    final_df = pd.concat([final_df, sub_df])

    value  = len(df.columns)
final_df = final_df.reset_index(drop=True)
print(final_df)

The code that i now have is slow because of the forloop. Is it possible to obtain the same solution without using the forloop?

CodePudding user response：

You can use the underlying numpy array with ravel and reshape with the order='F' parameter (column-major order) and the pandas.DataFrame constructor.

Then concat the output with the original array:

pd.concat([df,
           pd.DataFrame(df.to_numpy().ravel().reshape(df.shape, order='F'),
                        columns=df.columns)
          ], ignore_index=True)

output:

   pos1 pos2 pos3
0     A    A    A
1     B    B    B
2     C    C    C
3     D    D    D
4     E    E    E
5     F    F    F
6     G    G    G
7     H    H    H
8     I    I    I
9     A    D    G
10    A    D    G
11    A    D    G
12    B    E    H
13    B    E    H
14    B    E    H
15    C    F    I
16    C    F    I
17    C    F    I

CodePudding user response：

this is somewhat efficient if you want to use pandas only.

for value in range(1,int(len(df.index)/3)):
    df.loc[len(df) value*value]=df.iloc[(value*3)-3:value*3,0:1].T.values[0]
    df.loc[len(df) value*value 1]=df.iloc[(value*3)-3:value*3,0:1].T.values[0]
    df.loc[len(df) value*value 2]=df.iloc[(value*3)-3:value*3,0:1].T.values[0]