I searched the internet to find a solution for my problem, but i could not find it. I have the folowing dataframe
pos1 pos2 pos3
0 A A A
1 B B B
2 C C C
3 D D D
4 E E E
5 F F F
6 G G G
7 H H H
8 I I I
and i want to add to the existing dataframe the folowing dataframe:
pos1 pos2 pos3
0 A B C
1 A B C
2 A B C
3 D E F
4 D E F
5 D E F
6 G H I
7 G H I
8 G H I
So that i get the following dataframe:
pos1 pos2 pos3
0 A A A
1 B B B
2 C C C
3 D D D
4 E E E
5 F F F
6 G G G
7 H H H
8 I I I
9 A B C
10 A B C
11 A B C
12 D E F
13 D E F
14 D E F
15 G H I
16 G H I
17 G H I
I know that the number of row are always a multiple of the number of columns. That means if i have 4 columns than the rows should be either 4, 8, 12, 16, etc. Im my example the columns are 3 and the rows are 9
What i then want to do is transpose the rows into columns but only for that number of columns. So i want the first 3 rows to be transposed with the columns, then the next 3 rows and so forth.
I have now the following code:
import pandas as pd
import io
s = """pos1 pos2 pos3
A A A
B B B
C C C
D D D
E E E
F F F
G G G
H H H
I I I
"""
df = pd.read_csv(io.StringIO(s), delim_whitespace=True)
final_df = df.copy()
index_values = final_df.index.values
value = 0
while value < len(df.index):
sub_df = df[value:value 3]
sub_df.columns = index_values[value: value 3]
sub_df = sub_df.T
sub_df.columns = df.columns
final_df = pd.concat([final_df, sub_df])
value = len(df.columns)
final_df = final_df.reset_index(drop=True)
print(final_df)
The code that i now have is slow because of the forloop. Is it possible to obtain the same solution without using the forloop?
CodePudding user response:
You can use the underlying numpy array with ravel
and reshape
with the order='F'
parameter (column-major order) and the pandas.DataFrame
constructor.
Then concat
the output with the original array:
pd.concat([df,
pd.DataFrame(df.to_numpy().ravel().reshape(df.shape, order='F'),
columns=df.columns)
], ignore_index=True)
output:
pos1 pos2 pos3
0 A A A
1 B B B
2 C C C
3 D D D
4 E E E
5 F F F
6 G G G
7 H H H
8 I I I
9 A D G
10 A D G
11 A D G
12 B E H
13 B E H
14 B E H
15 C F I
16 C F I
17 C F I
CodePudding user response:
this is somewhat efficient if you want to use pandas only.
for value in range(1,int(len(df.index)/3)):
df.loc[len(df) value*value]=df.iloc[(value*3)-3:value*3,0:1].T.values[0]
df.loc[len(df) value*value 1]=df.iloc[(value*3)-3:value*3,0:1].T.values[0]
df.loc[len(df) value*value 2]=df.iloc[(value*3)-3:value*3,0:1].T.values[0]