Home > Enterprise >  How to transpose only a specific amount and add it to the existing dataframe
How to transpose only a specific amount and add it to the existing dataframe

Time:12-25

I searched the internet to find a solution for my problem, but i could not find it. I have the folowing dataframe

  pos1 pos2 pos3
0    A    A    A
1    B    B    B
2    C    C    C
3    D    D    D
4    E    E    E
5    F    F    F
6    G    G    G
7    H    H    H
8    I    I    I

and i want to add to the existing dataframe the folowing dataframe:

  pos1 pos2 pos3
0    A    B    C
1    A    B    C
2    A    B    C
3    D    E    F
4    D    E    F
5    D    E    F
6    G    H    I
7    G    H    I
8    G    H    I

So that i get the following dataframe:

   pos1 pos2 pos3
0     A    A    A
1     B    B    B
2     C    C    C
3     D    D    D
4     E    E    E
5     F    F    F
6     G    G    G
7     H    H    H
8     I    I    I
9     A    B    C
10    A    B    C
11    A    B    C
12    D    E    F
13    D    E    F
14    D    E    F
15    G    H    I
16    G    H    I
17    G    H    I

I know that the number of row are always a multiple of the number of columns. That means if i have 4 columns than the rows should be either 4, 8, 12, 16, etc. Im my example the columns are 3 and the rows are 9

What i then want to do is transpose the rows into columns but only for that number of columns. So i want the first 3 rows to be transposed with the columns, then the next 3 rows and so forth.

I have now the following code:

import pandas as pd
import io

s = """pos1 pos2 pos3
A A A
B B B
C C C
D D D
E E E 
F F F
G G G
H H H
I I I
"""
df = pd.read_csv(io.StringIO(s), delim_whitespace=True)

final_df = df.copy()
index_values = final_df.index.values

value = 0
while value <  len(df.index):
    sub_df = df[value:value 3]
    sub_df.columns = index_values[value: value   3]
    sub_df = sub_df.T
    sub_df.columns = df.columns
    final_df = pd.concat([final_df, sub_df])

    value  = len(df.columns)
final_df = final_df.reset_index(drop=True)
print(final_df)

The code that i now have is slow because of the forloop. Is it possible to obtain the same solution without using the forloop?

CodePudding user response:

You can use the underlying numpy array with ravel and reshape with the order='F' parameter (column-major order) and the pandas.DataFrame constructor.

Then concat the output with the original array:

pd.concat([df,
           pd.DataFrame(df.to_numpy().ravel().reshape(df.shape, order='F'),
                        columns=df.columns)
          ], ignore_index=True)

output:

   pos1 pos2 pos3
0     A    A    A
1     B    B    B
2     C    C    C
3     D    D    D
4     E    E    E
5     F    F    F
6     G    G    G
7     H    H    H
8     I    I    I
9     A    D    G
10    A    D    G
11    A    D    G
12    B    E    H
13    B    E    H
14    B    E    H
15    C    F    I
16    C    F    I
17    C    F    I

CodePudding user response:

this is somewhat efficient if you want to use pandas only.

for value in range(1,int(len(df.index)/3)):
    df.loc[len(df) value*value]=df.iloc[(value*3)-3:value*3,0:1].T.values[0]
    df.loc[len(df) value*value 1]=df.iloc[(value*3)-3:value*3,0:1].T.values[0]
    df.loc[len(df) value*value 2]=df.iloc[(value*3)-3:value*3,0:1].T.values[0]
  • Related